Working Paper: NBER ID: w10118
Authors: Kenneth Y. Chay; Patrick J. McEwan; Miguel Urquiola
Abstract: Several countries have implemented programs that use test scores to rank schools, and to reward or penalize them based on their students' average performance. Recently, Kane and Staiger (2002) have warned that imprecision in the measurement of school-level test scores could impede these efforts. There is little evidence, however, on how seriously noise hinders the evaluation of the impact of these interventions. We examine these issues in the context of Chile's P-900 program a country-wide intervention in which resources were allocated based on cutoffs in schools' mean test scores. We show that transitory noise in average scores and mean reversion lead conventional estimation approaches to greatly overstate the impacts of such programs. We then show how a regression discontinuity design that utilizes the discrete nature of the selection rule can be used to control for reversion biases. While the RD analysis provides convincing evidence that the P-900 program had significant effects on test score gains, these effects are much smaller than is widely believed.
Keywords: No keywords provided
JEL Codes: I2
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
transitory noise and mean reversion (C22) | estimated impact of p900 program (O22) |
p900 program (C87) | impact on test scores (I24) |
previous evaluations (C52) | overstatement of p900 effectiveness (C87) |
conventional methods (DID) (C90) | inflated estimates of p900 effectiveness (C87) |
RD analysis (R20) | no significant test score gains from 1988 to 1990 (I21) |
RD analysis (R20) | modest increase of about 0.2 standard deviations in gains from 1988 to 1992 (E65) |
RD design (O32) | clearer causal interpretation of p900 impact (F69) |