GTOct 17, 2023
Guarantees for Self-Play in Multiplayer Games via Polymatrix DecomposabilityRevan MacQueen, James R. Wright
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that the agents the learner will face post-training may have dramatically different behavior than the learner came to expect by interacting with itself. For the special case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent; however, no such guarantee exists for multiplayer games. We show that in games that approximately decompose into a set of two-player constant-sum games (called constant-sum polymatrix games) where global $ε$-Nash equilibria are boundedly far from Nash equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded vulnerability. For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms. We demonstrate our findings through experiments on Leduc poker.
GTApr 26, 2025
Approximating Nash Equilibria in General-Sum Games via Meta-LearningDavid Sychrovský, Christopher Solinas, Revan MacQueen et al.
Nash equilibrium is perhaps the best-known solution concept in game theory. Such a solution assigns a strategy to each player which offers no incentive to unilaterally deviate. While a Nash equilibrium is guaranteed to always exist, the problem of finding one in general-sum games is PPAD-complete, generally considered intractable. Regret minimization is an efficient framework for approximating Nash equilibria in two-player zero-sum games. However, in general-sum games, such algorithms are only guaranteed to converge to a coarse-correlated equilibrium (CCE), a solution concept where players can correlate their strategies. In this work, we use meta-learning to minimize the correlations in strategies produced by a regret minimizer. This encourages the regret minimizer to find strategies that are closer to a Nash equilibrium. The meta-learned regret minimizer is still guaranteed to converge to a CCE, but we give a bound on the distance to Nash equilibrium in terms of our meta-loss. We evaluate our approach in general-sum imperfect information games. Our algorithms provide significantly better approximations of Nash equilibria than state-of-the-art regret minimization techniques.
CRDec 1, 2020
Game-Theoretic Malware DetectionRevan MacQueen, Natalie Bombardieri, James R. Wright et al.
Malware attacks are costly. To mitigate against such attacks, organizations deploy malware detection tools that help them detect and eventually resolve those threats. While running only the best available tool does not provide enough coverage of the potential attacks, running all available tools is prohibitively expensive in terms of financial cost and computing resources. Therefore, an organization typically runs a set of tools that maximizes their coverage given a limited budget. However, how should an organization choose that set? Attackers are strategic, and will change their behavior to preferentially exploit the gaps left by a deterministic choice of tools. To avoid leaving such easily-exploited gaps, the defender must choose a random set. In this paper, we present an approach to compute an optimal randomization over size-bounded sets of available security analysis tools by modeling the relationship between attackers and security analysts as a leader-follower Stackelberg security game. We estimate the parameters of our model by combining the information from the VirusTotal dataset with the more detailed reports from the National Vulnerability Database. In an empirical comparison, our approach outperforms a set of natural baselines under a wide range of assumptions.