Test-Based Accountability and Student Achievement: An Investigation of Differential Performance on NAEP and State Assessments

Working Paper: NBER ID: w12817

Abstract: This paper explores the phenomenon referred to as test score inflation, which occurs when achievement gains on "high-stakes" exams outpace improvements on "low-stakes" tests. The first part of the paper documents the extent to which student performance trends on state assessments differ from those on the National Assessment of Educational Progress (NAEP). I find evidence of considerable test score inflation in several different states, including those with quite different state testing systems. The second part of the paper is a case study of Texas that uses detailed item-level data from the Texas Assessment of Academic Skills (TAAS) and the NAEP to explore why performance trends differed across these exams during the 1990s. I find that the differential improvement on the TAAS cannot be explained by several important differences across the exams (e.g., the NAEP includes open-response items, many NAEP multiple-choice items require/permit the use of calculators, rulers, protractors or other manipulative). I find that skill and format differences across exams explain the disproportionate improvement in the TAAS for fourth graders, although these differences cannot explain the time trends for eighth graders.

Keywords: test score inflation; student achievement; test-based accountability; NAEP; state assessments

JEL Codes: I2

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
test-based accountability (H52)	student achievement (I24)
nature of state assessments (H70)	performance on NAEP (I24)
differential performance on state assessments (I24)	differential performance on NAEP (I24)
skill and format differences across exams (C87)	disproportionate improvements for fourth graders (I24)
differences in content and format between assessments (C52)	performance outcomes (L25)
manipulation of the test-taking pool (C90)	differences in student effort across exams (D29)

Back to index