GT DS LGAug 19, 2022

Learning in Stackelberg Games with Non-myopic Agents

Nika Haghtalab, Thodoris Lykouris, Sloan Nietert, Alexander Wei

Berkeley

arXiv:2208.09407v317.843 citationsh-index: 25Has Code

Originality Highly original

AI Analysis

This addresses the challenge of strategic deception by long-lived agents in repeated interactions, which is incremental but with specific gains in applications like security games and pricing.

The paper tackles the problem of learning in Stackelberg games with non-myopic agents, where the principal lacks knowledge of the agent's payoff function, by developing a framework that reduces this to robust bandit optimization with myopic agents, resulting in improved query complexity from O(n^3) to near-optimal O~(n) in Stackelberg security games.

We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent's payoff function. Although learning in Stackelberg games is well-understood when the agent is myopic, dealing with non-myopic agents poses additional complications. In particular, non-myopic agents may strategize and select actions that are inferior in the present in order to mislead the principal's learning algorithm and obtain better outcomes in the future. We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents. Through the design and analysis of minimally reactive bandit algorithms, our reduction trades off the statistical efficiency of the principal's learning algorithm against its effectiveness in inducing near-best-responses. We apply this framework to Stackelberg security games (SSGs), pricing with unknown demand curve, general finite Stackelberg games, and strategic classification. In each setting, we characterize the type and impact of misspecifications present in near-best responses and develop a learning algorithm robust to such misspecifications. On the way, we improve the state-of-the-art query complexity of learning in SSGs with $n$ targets from $O(n^3)$ to a near-optimal $\widetilde{O}(n)$ by uncovering a fundamental structural property of these games. The latter result is of independent interest beyond learning with non-myopic agents.

View on arXiv PDF Code

Similar