Negatively Correlated Bandits

Working Paper: CEPR ID: DP6983

Abstract: We analyze a two-player game of strategic experimentation with two-armed bandits. Each player has to decide in continuous time whether to use a safe arm with a known payoff or a risky arm whose likelihood of delivering payoffs is initially unknown. The quality of the risky arms is perfectly negatively correlated between players. In marked contrast to the case where both risky arms are of the same type, we find that learning will be complete in any Markov perfect equilibrium if the stakes exceed a certain threshold, and that all equilibria are in cutoff strategies. For low stakes, the equilibrium is unique, symmetric, and coincides with the planner's solution. For high stakes, the equilibrium is unique, symmetric, and tantamount to myopic behavior. For intermediate stakes, there is a continuum of equilibria.

Keywords: Bayesian Learning; Exponential Distribution; Markov Perfect Equilibrium; Poisson Process; Strategic Experimentation; Two-Armed Bandit

JEL Codes: C73; D83; O32

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
Stakes (g, s) (D50)	Learning outcomes (complete or incomplete) (D52)
Stakes (g, s) (D50)	Efficiency of equilibrium strategies (C73)
Stakes (g, s) (D50)	Player behavior (myopic vs. strategic) (C72)
Player actions (kt) (C72)	Payoffs (u1, u2) (C79)
Beliefs about bandits' qualities (D80)	Player actions (kt) (C72)

Back to index