Undiscounted Bandit Games

Working Paper: CEPR ID: DP14046

Authors: R. Godfrey Keller; Sven Rady

Abstract: We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Lévy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized payoffs, players use Markov strategies with the common posterior belief about the unknown parameter as the state variable. We show that the unique symmetric Markov perfect equilibrium can be computed in a simple closed form involving only the payoff of the safe arm, the expected current payoff of the risky arm, and the expected full-information payoff, given the current belief. In particular, the equilibrium does not depend on the precise specification of the payoff-generating processes.

Keywords: Strategic experimentation; Two-armed bandit; Strong long-run average criterion; Markov perfect equilibrium; HJB equation; Viscosity solution

JEL Codes: C73; D83

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
Intensity of experimentation performed by others (C91)	Players' optimal actions (C73)
Payoff of the safe arm (G51)	Players' optimal actions (C73)
Expected current payoff of the risky arm (D81)	Players' optimal actions (C73)
Expected full-information payoff (D89)	Players' optimal actions (C73)
Payoffs generated from both arms (G19)	Evolution of players' posterior beliefs (C73)
Structure of the payoff-generating processes (G19)	Learning dynamics within the game (C73)

Back to index