Learning to Steer Learners in Games
This addresses a fundamental challenge in game theory for optimizing interactions with adaptive agents, but it is incremental as it builds on existing no-regret learning frameworks.
The paper tackles the problem of steering a no-regret learner to a Stackelberg equilibrium in repeated two-player games without knowing the learner's payoffs, showing it is impossible with general no-regret algorithms but effective when the learner's algorithm is from a smaller class, such as ascent or stochastic mirror ascent with known parameters.
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner's objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner's payoff structure. We demonstrate the effectiveness of this approach if the learner's algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.