Working Paper: NBER ID: w18010
Authors: Donald Boyd; Hamilton Lankford; Susanna Loeb; James Wyckoff
Abstract: Test-based accountability including value-added assessments and experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet we know little regarding important properties of these tests, an important example being the extent of test measurement error and its implications for educational policy and practice. While test vendors provide estimates of split-test reliability, these measures do not account for potentially important day-to-day differences in student performance. \n \nWe show there is a credible, low-cost approach for estimating the total test measurement error that can be applied when one or more cohorts of students take three or more tests in the subject of interest (e.g., state assessments in three consecutive grades). Our method generalizes the test-retest framework allowing for either growth or decay in knowledge and skills between tests as well as variation in the degree of measurement error across tests. The approach maintains relatively unrestrictive, testable assumptions regarding the structure of student achievement growth. Estimation only requires descriptive statistics (e.g., correlations) for the tests. When student-level test-score data are available, the extent and pattern of measurement error heteroskedasticity also can be estimated. Utilizing math and ELA test data from New York City, we estimate the overall extent of test measurement error is more than twice as large as that reported by the test vendor and demonstrate how using estimates of the total measurement error and the degree of heteroskedasticity along with observed scores can yield meaningful improvements in the precision of student achievement and achievement-gain estimates.
Keywords: Test Measurement Error; Educational Assessments; Standardized Testing
JEL Codes: I21
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
measurement error estimates (C20) | precision of student achievement estimates (C13) |
measurement error estimates (C20) | precision of achievement gain estimates (C13) |
measurement error (C20) | estimation of teacher effectiveness (A21) |
measurement error (C20) | estimation of school effectiveness (I21) |
measurement error (C20) | validity of achievement measures (C52) |
failure to account for measurement error (C52) | flawed educational policies (I28) |