Design-based Approach in Social Science Research
|01 December 2016
|01 December 2016
|Anustubh Agnihotri,Rahul Verma
|Notes on Methods
Design-based Approach in Social
Studies in Indian Politics
© 2016 Lokniti, Centre for the
Study of Developing Societies
In their famous book on econometric research methods, Angrist and Pischke (2009) urge the reader to
imagine their research questions as a randomized experiment. According to the authors, if you cannot
think of your hypothesis in terms of a randomized experiment, however unrealistic or expensive, you
may not have a causal research question at hand. By imagining the hypothesis as a randomized
experiment, the researcher is forced to clearly identify the ‘treatment’, or the independent variable of
interest. This is important because, as researchers we are interested not just in describing a phenomenon,
but also in explaining the cause(s) of the phenomenon; it is difficult to make a causal argument
without having a clear idea about what is being ‘manipulated’ or ‘changed’. Other researchers have also
echoed this thought process by saying, ‘To find out what happens when you change something, it is
necessary to change it’ (Box et al., 1978, p. 495).
Behind every causal statement, there is an implicit assumption of manipulation of some independent
variable that seems to be the ‘cause’ (Holland, 1986). This idea that there is no causation without
manipulation is at the heart of current shifts in social science research, especially the fields of economics
and political science that have placed greater emphasis on causal analysis that clearly identifies the cause
of a particular outcome. Randomized control trials (RCTs)—experiments where researchers use random
assignment to create well-defined treatment and control groups—have acquired the status of ‘gold
standard’, and have become synonymous with high-quality research.2 However, RCTs are expensive
and resource intensive to implement, and many times provide answers to narrowly defined questions.
Further, there are many questions of interest that are beyond the realm of RCTs. For example, if one
is interested in ‘causally’ estimating the effect of exposure to violence on a given set of outcomes
Note: This section is coordinated by Divya Vaid (firstname.lastname@example.org).
1 Travers Department of Political Science, University of California at Berkeley, CA, USA.
2 For example, it is very difficult to study the discrimination in job market based on religious and caste identity, since caste and
religion are correlated with other factors, such as education, that impact employability. The confounding factor biases results based
on the use of standard regression analysis on observational data. Banerjee et al. (2009) study the role of caste and religion in India’s
software and call-centre sectors using an RCT. They sent 3,160 fictitious resumes where caste and religious identity is randomly
assigned by changing surnames to 371 job openings in and around Delhi. Based on the callback rates, they find no evidence of
discrimination against non-upper-caste applicants for software jobs. But, in the case of call-centre jobs, they do find larger and
significant differences between callback rates for upper castes and Other Backward Castes. They find no discrimination against
Anustubh Agnihotri, 1070 Campus Drive, Stanford, CA 94305, USA.
Studies in Indian Politics 4(2)
(say public goods provision), practical and ethical considerations rule out conducting RCTs to study
Given these constraints, what are the options for social scientists interested in making causal claims?
Regression analysis has traditionally been used on observational data to ‘model’ social processes and
‘control’ for other factors. However, there are several limitations to this approach when it comes to
making causal claims. First, analysis based on observational data using multivariate regression models
takes into account some observable confounders, but fails to completely eliminate the possibility of bias,
since we may not be aware of all confounding variables (Freedman, 2006).3 Further, including endogenous
variables4 can also bias the results. At a deeper level, beyond the challenges of confounding and self-
selection, regression models also make strong assumptions about the data generating process5 and make
several methodological assumptions that are difficult to verify (Dunning, 2010).
To illustrate this point, let us take the theoretical claim that incumbents are more likely to win elec-
tions.6 We can test this idea using a data set of India’s parliamentary elections from 1977 to 2008.7 India
had nine rounds of parliamentary elections between these years and approximately 543 × 9, that is, 4887
unique electoral contests. We can analyze this data set to see whether incumbents are more likely to win
an election than non-incumbents. We may run a bivariate regression and find that the coefficient on the
incumbency status variable is statistically significant and positive, indicating that incumbency status
increases the chances of re-election.8 But is this relationship causal? The answer is no. There may be
several factors confounding the relationship between incumbency status (X) and the probability of a re-
election (Y). It is possible that incumbents are getting re-elected not because of their incumbency status
but because of some other factor (Z). For instance, it is likely that the incumbents are of intrinsically
higher quality. What if we manage to control the quality of candidates (Z) and run a new multivariate
regression and find that the incumbency still increases chances of re-election? We cannot be confident
about a causal relationship, as we have not accounted for all possible observable confounders, such as
difference in background characteristics like higher levels of wealth or self-selection into constituencies
that are more likely to elect an...
To continue readingRequest your trial