Impact evaluation: Difference between revisions
imported>Nick Bagnall m (m) |
imported>Nick Bagnall (added other examples section) |
||
Line 15: | Line 15: | ||
*Crossovers, members of the control group who "cross over" into the treatment group. | *Crossovers, members of the control group who "cross over" into the treatment group. | ||
These threats can affect the validity of all types of impact evaluation, randomized or otherwise. Non-experimental and quasi-experimental evaluations face additional methodological issues such as confounding factors and selection bias. While random assignment addresses these issues to maximize an impact evaluation’s internal validity (the ability to generalize the study’s results to the population the sample was drawn from) there remain inherent limitations to an impact evaluation’s [[external validity]] (the ability to generalize the study’s results to other populations). Testing a program in multiple disparate settings is a great way to determine whether the program's results are generally replicable and thus worth “scaling up.” Knowledge of a particular setting can help determine whether a program's results can indeed be replicated in that context. For example, consider a policy intervention designed to increase school enrollment by informing parents about the positive correlation between additional schooling and increased wages: If, in a given school system, parents are inclined to underestimate the effects of additional schooling on wages, they will more likely be influenced by the information than parents in another school system who generally overestimate the effects of additional schooling on wages. | These threats can affect the validity of all types of impact evaluation, randomized or otherwise. Non-experimental and quasi-experimental evaluations face additional methodological issues such as confounding factors and selection bias. While random assignment addresses these issues to maximize an impact evaluation’s internal validity (the ability to generalize the study’s results to the population the sample was drawn from) there remain inherent limitations to an impact evaluation’s [[external validity]] (the ability to generalize the study’s results to other populations). Testing a program in multiple disparate settings is a great way to determine whether the program's results are generally replicable and thus worth “scaling up.” Knowledge of a particular setting can help determine whether a program's results can indeed be replicated in that context. For example, consider a policy intervention designed to increase school enrollment by informing parents about the positive correlation between additional schooling and increased wages: If, in a given school system, parents are inclined to underestimate the effects of additional schooling on wages, they will more likely be influenced by the information than parents in another school system who generally overestimate the effects of additional schooling on wages. | ||
==Other examples of impact evaluations== | |||
The most renowned large-scale development experiment/impact evaluation is a conditional cash transfer program named [[Oportunidades]] (formerly known as Progresa). The program, launched by the Mexican government in 1997, targets poverty by providing cash payments to families whose children meet certain conditions such as regular school attendance. Inspired by the success of Oportunidades, similar conditional cash transfer programs have since been implemented by a number of governments in developing countries. | |||
Although Oportunidades has proven effective in improving a number of development outcomes among beneficiaries, it is very expensive, so other organizations have conducted a number of cost-effectiveness studies to compare alternative solutions. In comparing the cost effectiveness of various programs designed to improve school participation, for example, the [[Abul Latif Jameel Poverty Action Lab]] (J-PAL) at the [[Massachusetts Institute of Technology]] found that distributing de-worming tablets to children was substantially more cost-effective than conditional cash transfer programs. J-PAL offers a [http://www.povertyactionlab.org/evaluations?filters=type:evaluation&filters=type:evaluation | searchable database]] of hundreds of randomized impact evaluations conducted either by themselves or affiliates. |
Revision as of 05:18, 20 February 2011
An impact evaluation is a study designed to estimate the effects that can be attributed to a policy program or intervention. Impact evaluation is a useful tool for measuring a program’s effectiveness because it doesn’t merely examine whether the program’s goals were met; it also determines whether those goals would have been met in the absence of the program by establishing a cause-and-effect relationship between program activities and the outcomes of interest.
Because impact evaluation reliably quantifies program efficacy, it is used in a variety of program studies and has especially found application in development economics to increase effectiveness of aid delivery in developing nations.
How is a policy’s impact measured?
There are multiple methods for conducting rigorous impact evaluations, yet all necessarily rely on simulating the counterfactual—in other words, estimating what would have happened to the scrutinized group in the absence of the intervention. Counterfactual analysis thus requires a ‘control’ group—people unaffected by the policy intervention—to compare to the program’s beneficiaries, who comprise the ‘treatment’ group of a population sample. The ability to draw causal inferences from the impact evaluation crucially depends on the two groups being statistically identical, meaning there are no systematic differences between them.
To minimize systematic differences, researchers design impact evaluations to be at least quasi-experimental. Quasi-experimental approaches can remove bias arising from selection on observables and, where panel data are available, time invariant unobservables. Quasi-experimental methods vary but are usually carried out by multivariate regression analysis.
Although quasi-experimental methods have their advantages, systematic differences are best eliminated through a full experimental approach, which involves random assignment: a law of statistics guarantees that a large enough sample size of people randomly assigned will generate statistically identical comparison groups. Thus, the control group mimics the counterfactual, and any differences that arise between the two groups after the program is implemented may be reliably attributed to the program provided that threats to the study's validity are controlled for. These threats include:
- The Hawthorne effect, which occurs when members of the treatment group change their behavior in response to the knowledge that they are being studied, not in response to any particular experimental manipulation. The John Hendry effect occurs when members of the control group do so.
- No-shows, members of the treatment group who fail to attend some function where their attendance is necessary to the study's design.
- Spillover, which occurs when members of the control group are affected by the intervention.
- Contamination, which occurs when members of treatment and/or comparison groups have access to another intervention which also affects the outcome(s) of interest.
- Crossovers, members of the control group who "cross over" into the treatment group.
These threats can affect the validity of all types of impact evaluation, randomized or otherwise. Non-experimental and quasi-experimental evaluations face additional methodological issues such as confounding factors and selection bias. While random assignment addresses these issues to maximize an impact evaluation’s internal validity (the ability to generalize the study’s results to the population the sample was drawn from) there remain inherent limitations to an impact evaluation’s external validity (the ability to generalize the study’s results to other populations). Testing a program in multiple disparate settings is a great way to determine whether the program's results are generally replicable and thus worth “scaling up.” Knowledge of a particular setting can help determine whether a program's results can indeed be replicated in that context. For example, consider a policy intervention designed to increase school enrollment by informing parents about the positive correlation between additional schooling and increased wages: If, in a given school system, parents are inclined to underestimate the effects of additional schooling on wages, they will more likely be influenced by the information than parents in another school system who generally overestimate the effects of additional schooling on wages.
Other examples of impact evaluations
The most renowned large-scale development experiment/impact evaluation is a conditional cash transfer program named Oportunidades (formerly known as Progresa). The program, launched by the Mexican government in 1997, targets poverty by providing cash payments to families whose children meet certain conditions such as regular school attendance. Inspired by the success of Oportunidades, similar conditional cash transfer programs have since been implemented by a number of governments in developing countries.
Although Oportunidades has proven effective in improving a number of development outcomes among beneficiaries, it is very expensive, so other organizations have conducted a number of cost-effectiveness studies to compare alternative solutions. In comparing the cost effectiveness of various programs designed to improve school participation, for example, the Abul Latif Jameel Poverty Action Lab (J-PAL) at the Massachusetts Institute of Technology found that distributing de-worming tablets to children was substantially more cost-effective than conditional cash transfer programs. J-PAL offers a | searchable database] of hundreds of randomized impact evaluations conducted either by themselves or affiliates.