TH AI LGMay 12, 2022

Social learning via actions in bandit environments

arXiv:2205.06107v11.2h-index: 1

Originality Incremental advance

AI Analysis

This addresses strategic decision-making in multi-agent bandit environments, relevant for economics and AI, but is incremental as it builds on existing Bayesian bandit models.

The paper analyzes strategic exploration in Bayesian bandit games where agents have private payoffs and public actions, focusing on cascade equilibria where agents switch from risky to riskless actions based on pessimism. It finds that individual exploration varies with prior beliefs, with the most optimistic agent always underexploring, and shows that enforceable contracts lead the most optimistic agent to buy all payoff streams, explaining start-up buyouts.

I study a game of strategic exploration with private payoffs and public actions in a Bayesian bandit setting. In particular, I look at cascade equilibria, in which agents switch over time from the risky action to the riskless action only when they become sufficiently pessimistic. I show that these equilibria exist under some conditions and establish their salient properties. Individual exploration in these equilibria can be more or less than the single-agent level depending on whether the agents start out with a common prior or not, but the most optimistic agent always underexplores. I also show that allowing the agents to write enforceable ex-ante contracts will lead to the most ex-ante optimistic agent to buy all payoff streams, providing an explanation to the buying out of smaller start-ups by more established firms.

View on arXiv PDF

Similar