Generation Next: Experimentation with AI

Working Paper: NBER ID: w31679

Authors: Gary Charness; Brian Jabarian; John A. List

Abstract: We investigate the potential for Large Language Models (LLMs) to enhance scientific practice within experimentation by identifying key areas, directions, and implications. First, we discuss how these models can improve experimental design, including improving the elicitation wording, coding experiments, and producing documentation. Second, we delve into the use of LLMs in experiment implementation, with an emphasis on bolstering causal inference through creating consistent experiences, improving instruction comprehension, and real-time monitoring of participant engagement. Third, we underscore the role of LLMs in analyzing experimental data, encompassing tasks like pre-processing, data cleaning, and assisting reviewers and replicators in examining studies. Each of these tasks improves the probability of reporting accurate findings. Lastly, we suggest a scientific governance framework that mitigates the potential risks of using LLMs in experimental research while amplifying their advantages. This could pave the way for open science opportunities and foster a culture of policy and industry experimentation at scale.

Keywords: large language models; AI; experimental design; causal inference; open science

JEL Codes: C0; C1; C80; C82; C87; C9; C90; C92; C99

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
LLMs (Y20)	improved elicitation and wording of experimental tasks (C90)
improved elicitation and wording of experimental tasks (C90)	better participant understanding and engagement (C90)
LLMs (Y20)	better participant understanding and engagement (C90)
LLMs (Y20)	real-time monitoring of participant engagement (C90)
real-time monitoring of participant engagement (C90)	improved data quality (L15)
LLMs (Y20)	improved data quality (L15)
LLMs (Y20)	data preprocessing and cleaning (C80)
data preprocessing and cleaning (C80)	more accurate findings (C52)
LLMs (Y20)	more accurate findings (C52)

Back to index