GT PRFeb 17

Can a Weaker Player Win? Adaptive Play in Repeated Games

arXiv:2604.15315h-index: 13

AI Analysis

This work addresses strategic adaptation in game theory for scenarios where weaker players seek to exploit dynamics over repeated interactions, though it appears incremental as it builds on existing control and game theory frameworks.

The paper investigates whether a weaker player can achieve a positive gain in repeated two-player games by adaptively switching between offensive and defensive styles, despite each style being losing in expectation. It identifies parameter regimes where optimal adaptive policies yield strictly positive gains at certain horizons, with limiting gains ranging from -1 to 1 depending on conditions like fairness and safety.

Consider a two-player game repeated N times. Player 1 can choose between two styles (for interpretability, offensive and defensive), whereas Player 2 uses a single fixed style. Let X N\,:= \#wins -\#losses for Player 1 after N games, and define the match gain as E[sign(X N )], with sign(0) = 0. We assume Player 1 is weaker in the sense that each pure style is losing in expectation. Our objective is to identify under which parameter regimes Player 1 can nevertheless achieve a positive gain under an optimal adaptive policy. Using dynamic programming, we solve the finite-horizon control problem and numerically identify parameter regimes in which the optimal gain is strictly positive at some horizon N $\star$ . We also derive structural conditions guaranteeing that g $\star$ N is always negative, and regimes (notably with fair (D)) where g $\star$ N is nonnegative for all N and can be strictly positive for every N $\ge$ 2. We then characterize the asymptotic behavior as N $\rightarrow$ $\infty$ for a weak player. In the safe case, where the defensive style induces a sure draw, the limiting gain varies continuously with the parameters and may take any value in [0, 1]. In the non-safe case, the limiting gain converges to -1 when both styles are strictly losing, and to 0 when (D) is fair (and non-safe).

View on arXiv PDF

Similar