LGMLApr 20, 2020

Approximate exploitability: Learning a best response in large games

arXiv:2004.09677v536 citations
AI Analysis

This addresses the need for robust agent evaluation in AI systems, particularly for trust and safety in adversarial settings, though it is incremental as it builds on prior work in computer poker and search methods.

The paper tackles the problem of evaluating worst-case performance of agents in large games, where exact computation is infeasible, by introducing ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm that approximates worst-case outcomes by learning a best response, and demonstrates it in two-player zero-sum games against various agents, including AlphaZero-based ones.

Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes