GT LGSep 30, 2019

Strategizing against No-regret Learners

Yuan Deng, Jon Schneider, Balusubramanian Sivan

arXiv:1909.13861v221.885 citations

Originality Incremental advance

AI Analysis

This addresses strategic decision-making in game theory for scenarios involving adaptive opponents, with incremental contributions to understanding no-regret dynamics.

The paper tackles the problem of how a player should strategize against a no-regret learner in repeated games to maximize utility, showing that under mild assumptions, the player can guarantee at least Stackelberg equilibrium utility, and with more than two actions and mean-based strategies, can achieve strictly higher utility.

How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium of the game. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two actions and plays a mean-based no-regret strategy, we show that the player can get strictly higher than the Stackelberg equilibrium utility. We provide a characterization of the optimal game-play for the player against a mean-based no-regret learner as a solution to a control problem. When the no-regret learner's strategy also guarantees him a no-swap regret, we show that the player cannot get anything higher than a Stackelberg equilibrium utility.

View on arXiv PDF

Similar