Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Working Paper: NBER ID: w19043

Authors: Roland G. Fryer Jr.; Philipp Harms

Abstract: We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm'' increases when choosing that arm and decreases when choosing the "safe'' arm. These dynamics are natural in applications such as human capital development, job search, and occupational choice. Using new insights from stochastic control, along with a monotonicity condition on the payoff dynamics, we show that optimal strategies in our model are stopping rules that can be characterized by an index which formally coincides with Gittins' index. Our result implies the indexability of a new class of "restless'' bandit models.

Keywords: bandit models; stochastic control; human capital; investment strategies; indexability

JEL Codes: J0; J24; L0


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
Investing in the unknown arm (G11)Human capital (J24)
Human capital (J24)Future rewards (D15)
Investing in the unknown arm (G11)Future rewards (D15)
Today's investments (G11)Profitability of future investments (G31)
Investment strategy (G11)Human capital (J24)
Investment strategy (G11)Future rewards (D15)
Investing in the unknown arm (G11)Information about reward distribution (D30)
Information about reward distribution (D30)Informed decisions (D87)
Investment strategies (G11)Policy recommendations (D78)

Back to index