Is Thompson Sampling Susceptible to Algorithmic Collusion?
This addresses the problem of unintended collusion in multi-agent systems for researchers and practitioners in game theory and AI, showing it is incremental by analyzing a specific algorithm under certain conditions.
The paper investigates whether Thompson Sampling leads to algorithmic collusion in repeated games with unknown payoffs, finding that under a mild assumption on payoff matrices, it converges to Nash equilibrium, preventing collusion, but collusion can occur when this assumption is not met.
When two players are engaged in a repeated game with unknown payoff matrices, they may use single-agent multi-armed bandit algorithms to choose the actions independent of each other. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the payoff matrices. Therefore, algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies. To prove the convergence result, we find that the framework developed in stochastic approximation doesn't apply, because of the sporadic and infrequent updates of the inferior actions and the lack of Lipschitz continuity. We develop a novel sample-path-wise approach to show the convergence. However, when the payoff matrices do not satisfy the assumption, the game may converge to collusive outcomes.