The Central Role of Noise in Evaluating Interventions that Use Test Scores to Rank Schools

Working Paper: NBER ID: w10118

Authors: Kenneth Y. Chay; Patrick J. McEwan; Miguel Urquiola

Abstract: Several countries have implemented programs that use test scores to rank schools, and to reward or penalize them based on their students' average performance. Recently, Kane and Staiger (2002) have warned that imprecision in the measurement of school-level test scores could impede these efforts. There is little evidence, however, on how seriously noise hinders the evaluation of the impact of these interventions. We examine these issues in the context of Chile's P-900 program a country-wide intervention in which resources were allocated based on cutoffs in schools' mean test scores. We show that transitory noise in average scores and mean reversion lead conventional estimation approaches to greatly overstate the impacts of such programs. We then show how a regression discontinuity design that utilizes the discrete nature of the selection rule can be used to control for reversion biases. While the RD analysis provides convincing evidence that the P-900 program had significant effects on test score gains, these effects are much smaller than is widely believed.

Keywords: No keywords provided

JEL Codes: I2

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
transitory noise and mean reversion (C22)	estimated impact of p900 program (O22)
p900 program (C87)	impact on test scores (I24)
previous evaluations (C52)	overstatement of p900 effectiveness (C87)
conventional methods (DID) (C90)	inflated estimates of p900 effectiveness (C87)
RD analysis (R20)	no significant test score gains from 1988 to 1990 (I21)
RD analysis (R20)	modest increase of about 0.2 standard deviations in gains from 1988 to 1992 (E65)
RD design (O32)	clearer causal interpretation of p900 impact (F69)

Back to index