51.6GTMar 17
Steering No-Regret Learners to a Desired EquilibriumBrian Hu Zhang, Gabriele Farina, Ioannis Anagnostides et al.
A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator's budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator's payments: a total budget and a per-round budget. If the mediator's total budget does not grow with $T$, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with $T$, that is, for the average payment to vanish. When players' full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on $T$. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games.
53.8GTMay 12
Learning a Game by Paying the AgentsBrian Hu Zhang, Tao Lin, Yiling Chen et al.
We study the problem of learning the utility functions of no-regret learning agents in a repeated normal-form game. Differing from most prior literature, we introduce a principal with the power to observe the agents playing the game, send agents signals, and give agents payments as a function of their actions. We show that the principal can, using a number of rounds polynomial in the size of the game, learn the utility functions of all agents to any desired precision $ε> 0$, for any no-regret learning algorithms of the agents. Our main technique is to formulate a zero-sum game between the principal and the agents, where the principal chooses strategies among the set of all payment functions to minimize the agent's payoff. Finally, we discuss implications for the problem of steering agents. We introduce, using our utility-learning algorithm as a subroutine, the first algorithm for steering arbitrary no-regret learning agents to a desired equilibrium without prior knowledge of their utility functions.
70.1GTMay 17
On the Complexity of Correlated Equilibria Beyond Normal-Form GamesIoannis Anagnostides, Constantinos Daskalakis, Gabriele Farina et al.
Correlated equilibria are a fundamental solution concept in game theory. However, despite decades of research, the complexity beyond games of polynomial type -- such as extensive-form games, congestion or routing games, and more broadly concave games -- has remained a major open problem, first highlighted by Papadimitriou and Roughgarden (JACM '08). In this paper, we resolve several long-standing questions concerning the complexity of correlated equilibria and swap regret minimization. First, we show that computing a correlated equilibrium in concave quadratic games is as hard as computing the fixed point of a contraction mapping (Contr), providing the first strong evidence of intractability. Moreover, we establish an unconditional, information-theoretic lower bound ruling out the existence of a strongly sublinear swap regret minimizer: any online learning algorithm requires exponentially many iterations in the dimension $d$ to guarantee at most $1/\text{poly}(d)$ (average) swap regret. To circumvent these hardness results, we examine the complexity of $Φ$-equilibria -- tractable relaxations of correlated equilibria. We obtain a fully polynomial-time approximation scheme (FPTAS) for computing poly-dimensional $Φ$-equilibria in general concave games. We complement this by showing that Contr-hardness persists even under poly-dimensional swap deviations in the regime where the precision $ε$ is exponentially small. Finally, we show that Contr-hardness can be bypassed in the canonical setting of concave \emph{quadratic games}, for which we provide a $\text{poly}(d, \log(1/ε))$-time algorithm for computing poly-dimensional $Φ$-equilibria. As a byproduct, we obtain an algorithm for computing fixed points of a mapping that is contracting with respect to an unknown Mahalanobis norm, which could be of independent interest.
GTFeb 16
Decision Making under Imperfect Recall: Algorithms and BenchmarksEmanuel Tewolde, Brian Hu Zhang, Ioannis Anagnostides et al.
In game theory, imperfect-recall decision problems model situations in which an agent forgets information it held before. They encompass games such as the ``absentminded driver'' and team games with limited communication. In this paper, we introduce the first benchmark suite for imperfect-recall decision problems. Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems that elicit sensitive information, and AI safety via testing of agents in simulation. Across 61 problem instances generated using this suite, we evaluate the performance of different algorithms for finding first-order optimal strategies in such problems. In particular, we introduce the family of regret matching (RM) algorithms for nonlinear constrained optimization. This class of parameter-free algorithms has enjoyed tremendous success in solving large two-player zero-sum games, but, surprisingly, they were hitherto relatively unexplored beyond that setting. Our key finding is that RM algorithms consistently outperform commonly employed first-order optimizers such as projected gradient descent, often by orders of magnitude. This establishes, for the first time, the RM family as a formidable approach to large-scale constrained optimization problems.
29.1GTMay 3
Efficient representations for team and imperfect-recall equilibrium computationLuca Carminati, Brian Hu Zhang, Federico Cacciamani et al.
Equilibrium finding in two-player zero-sum games with perfect recall is a well-studied topic that has led to many breakthroughs in computational game theory. This paper aims to generalize such techniques to (timeable) two-player zero-sum games with imperfect recall, or equivalently to two-team zero-sum games. In this setting, the problem of computing a mixed-strategy Nash equilibrium (or, equivalently, a team maxmin equilibrium with correlation) is known to be NP-hard. We connect the imperfect-recall setting with its perfect-recall counterpart through a novel construction we call the belief game. This is a perfect-recall game equivalent to a given (timeable) two-player zero-sum game with imperfect recall. The belief game may be exponentially larger than the original game but can be solved using any standard method. We then show that the strategy spaces of the two players in the belief game can be directly represented as a DAG, leading to a possibly exponential speedup. We call this the team belief DAG (TB-DAG). The TB-DAG simultaneously enjoys essentially optimal parameterized complexity bounds and the advantages of efficient regret minimization techniques. Along the way, we show $Δ_2^P$-completeness and $Σ_2^P$-completeness of finding Nash equilibria in both mixed and behavioral strategies for the class of games we consider. Experimentally, we show that the TB-DAG, when paired with existing learning techniques, yields state-of-the-art performance on a wide variety of benchmark team games.
GTJan 15, 2025
Computing Game Symmetries and Equilibria That Respect ThemEmanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld et al.
Strategic interactions can be represented more concisely, and analyzed and solved more efficiently, if we are aware of the symmetries within the multiagent system. Symmetries also have conceptual implications, for example for equilibrium selection. We study the computational complexity of identifying and using symmetries. Using the classical framework of normal-form games, we consider game symmetries that can be across some or all players and/or actions. We find a strong connection between game symmetries and graph automorphisms, yielding graph automorphism and graph isomorphism completeness results for characterizing the symmetries present in a game. On the other hand, we also show that the problem becomes polynomial-time solvable when we restrict the consideration of actions in one of two ways. Next, we investigate when exactly game symmetries can be successfully leveraged for Nash equilibrium computation. We show that finding a Nash equilibrium that respects a given set of symmetries is PPAD- and CLS-complete in general-sum and team games respectively -- that is, exactly as hard as Brouwer fixed point and gradient descent problems. Finally, we present polynomial-time methods for the special cases where we are aware of a vast number of symmetries, or where the game is two-player zero-sum and we do not even know the symmetries.
GTJun 2, 2025
General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chessBrian Hu Zhang, Tuomas Sandholm
Since the advent of AI, games have served as progress benchmarks. Meanwhile, imperfect-information variants of chess have existed for over a century, present extreme challenges, and have been the focus of significant AI research. Beyond calculation needed in regular chess, they require reasoning about information gathering, the opponent's knowledge, signaling, etc. The most popular variant, Fog of War (FoW) chess (aka. dark chess) is a recognized challenge problem in AI after superhuman performance was reached in no-limit Texas hold'em poker. We present Obscuro, the first superhuman AI for FoW chess. It introduces advances to search in imperfect-information games, enabling strong, scalable reasoning. Experiments against the prior state-of-the-art AI and human players -- including the world's best -- show that Obscuro is significantly stronger. FoW chess is the largest (by amount of imperfect information) turn-based game in which superhuman performance has been achieved and the largest game in which imperfect-information search has been successfully applied.
MLFeb 25, 2025
Learning and Computation of $Φ$-Equilibria at the Frontier of TractabilityBrian Hu Zhang, Ioannis Anagnostides, Emanuel Tewolde et al.
$Φ$-equilibria -- and the associated notion of $Φ$-regret -- are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $Φ$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '24) -- abbreviated as DFFPS -- settled the existence of efficient algorithms when $Φ$ contains only linear maps under a general, $d$-dimensional convex constraint set $\mathcal{X}$. In this paper, we significantly extend their work by resolving the case where $Φ$ is $k$-dimensional; degree-$\ell$ polynomials constitute a canonical such example with $k = d^{O(\ell)}$. In particular, positing only oracle access to $\mathcal{X}$, we obtain two main positive results: i) a $\text{poly}(n, d, k, \text{log}(1/ε))$-time algorithm for computing $ε$-approximate $Φ$-equilibria in $n$-player multilinear games, and ii) an efficient online algorithm that incurs average $Φ$-regret at most $ε$ using $\text{poly}(d, k)/ε^2$ rounds. We also show nearly matching lower bounds in the online learning setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Φ$-regret. From a technical standpoint, we extend the framework of DFFPS from linear maps to the more challenging case of maps with polynomial dimension. At the heart of our approach is a polynomial-time algorithm for computing an expected fixed point of any $φ: \mathcal{X} \to \mathcal{X}$ based on the ellipsoid against hope (EAH) algorithm of Papadimitriou and Roughgarden (JACM '08). In particular, our algorithm for computing $Φ$-equilibria is based on executing EAH in a nested fashion -- each step of EAH itself being implemented by invoking a separate call to EAH.
GTFeb 25, 2025
Expected Variational InequalitiesBrian Hu Zhang, Ioannis Anagnostides, Emanuel Tewolde et al.
Variational inequalities (VIs) encompass many fundamental problems in diverse areas ranging from engineering to economics and machine learning. However, their considerable expressivity comes at the cost of computational intractability. In this paper, we introduce and analyze a natural relaxation -- which we refer to as expected variational inequalities (EVIs) -- where the goal is to find a distribution that satisfies the VI constraint in expectation. By adapting recent techniques from game theory, we show that, unlike VIs, EVIs can be solved in polynomial time under general (nonmonotone) operators. EVIs capture the seminal notion of correlated equilibria, but enjoy a greater reach beyond games. We also employ our framework to capture and generalize several existing disparate results, including from settings such as smooth games, and games with coupled constraints or nonconcave utilities.
GTOct 20, 2025
Convergence of Regret Matching in Potential Games and Constrained OptimizationIoannis Anagnostides, Emanuel Tewolde, Brian Hu Zhang et al.
Regret matching (RM) -- and its modern variants -- is a foundational online algorithm that has been at the heart of many AI breakthrough results in solving benchmark zero-sum games, such as poker. Yet, surprisingly little is known so far in theory about its convergence beyond two-player zero-sum games. For example, whether regret matching converges to Nash equilibria in potential games has been an open problem for two decades. Even beyond games, one could try to use RM variants for general constrained optimization problems. Recent empirical evidence suggests that they -- particularly regret matching$^+$ (RM$^+$) -- attain strong performance on benchmark constrained optimization problems, outperforming traditional gradient descent-type algorithms. We show that RM$^+$ converges to an $ε$-KKT point after $O_ε(1/ε^4)$ iterations, establishing for the first time that it is a sound and fast first-order optimizer. Our argument relates the KKT gap to the accumulated regret, two quantities that are entirely disparate in general but interact in an intriguing way in our setting, so much so that when regrets are bounded, our complexity bound improves all the way to $O_ε(1/ε^2)$. From a technical standpoint, while RM$^+$ does not have the usual one-step improvement property in general, we show that it does in a certain region that the algorithm will quickly reach and remain in thereafter. In sharp contrast, our second main result establishes a lower bound: RM, with or without alternation, can take an exponential number of iterations to reach a crude approximate solution even in two-player potential games. This represents the first worst-case separation between RM and RM$^+$. Our lower bound shows that convergence to coarse correlated equilibria in potential games is exponentially faster than convergence to Nash equilibria.
GTOct 6, 2025
Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum GamesBrian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm
A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods. Although a convergence rate of $T^{-1}$ has long been established since Nemirovski's mirror-prox algorithm and Nesterov's excessive gap technique in the early 2000s, the most effective paradigm in practice is *counterfactual regret minimization*, which is based on *regret matching* and its modern variants. In particular, the state of the art across most benchmarks is *predictive* regret matching$^+$ (PRM$^+$), in conjunction with non-uniform averaging. Yet, such algorithms can exhibit slower $Ω(T^{-1/2})$ convergence even in self-play. In this paper, we close the gap between theory and practice. We propose a new scale-invariant and parameter-free variant of PRM$^+$, which we call IREG-PRM$^+$. We show that it achieves $T^{-1/2}$ best-iterate and $T^{-1}$ (i.e., optimal) average-iterate convergence guarantees, while also being on par with PRM$^+$ on benchmark games. From a technical standpoint, we draw an analogy between IREG-PRM$^+$ and optimistic gradient descent with *adaptive* learning rate. The basic flaw of PRM$^+$ is that the ($\ell_2$-)norm of the regret vector -- which can be thought of as the inverse of the learning rate -- can decrease. By contrast, we design IREG-PRM$^+$ so as to maintain the invariance that the norm of the regret vector is nondecreasing. This enables us to derive an RVU-type bound for IREG-PRM$^+$, the first such property that does not rely on introducing additional hyperparameters to enforce smoothness. Furthermore, we find that IREG-PRM$^+$ performs on par with an adaptive version of optimistic gradient descent that we introduce whose learning rate depends on the misprediction error, demystifying the effectiveness of the regret matching family *vis-a-vis* more standard optimization techniques.
OCApr 4, 2025
A Polynomial-Time Algorithm for Variational Inequalities under the Minty ConditionIoannis Anagnostides, Gabriele Farina, Tuomas Sandholm et al.
Solving variational inequalities (SVIs) is a foundational problem at the heart of optimization. However, this expressivity comes at the cost of computational hardness. As a result, most research has focused on carving out specific subclasses that elude those intractability barriers. A classical property that goes back to the 1960s is the Minty condition, which postulates that the Minty VI (MVI) problem admits a solution. In this paper, we establish the first polynomial-time algorithm -- that is, with complexity growing polynomially in the dimension $d$ and $\log(1/ε)$ -- for solving $ε$-SVIs for Lipschitz continuous mappings under the Minty condition. Prior approaches either incurred an exponentially worse dependence on $1/ε$ or made restrictive assumptions. To do so, we introduce a new variant of the ellipsoid algorithm whereby separating hyperplanes are obtained after taking a gradient descent step from the center of the ellipsoid. It succeeds even though the set of SVIs can be nonconvex and not fully dimensional. Moreover, when our algorithm is applied to an instance with no MVI solution and fails to identify an SVI solution, it produces a succinct certificate of MVI infeasibility. We also show that deciding whether the Minty condition holds is $\mathsf{coNP}$-complete, thereby establishing that the disjunction of those two problems is polynomial-time solvable even though each problem is individually intractable. We provide several extensions and new applications of our main results. Most notably, we obtain the first polynomial-time algorithms for i) globally minimizing a (potentially nonsmooth) quasar-convex function, and ii) computing Nash equilibria in multi-player harmonic games. Finally, in two-player general-sum concave games, we give the first polynomial-time algorithm that outputs either a Nash equilibrium or a strict coarse correlated equilibrium.
GTJun 23, 2024
Imperfect-Recall Games: Equilibrium Concepts and Their ComplexityEmanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld et al.
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, $Σ_2^P$ , $\exists$R, and $\exists \forall$R.
GTJun 5, 2020
Sparsified Linear Programming for Zero-Sum Equilibrium FindingBrian Hu Zhang, Tuomas Sandholm
Computational equilibrium finding in large zero-sum extensive-form imperfect-information games has led to significant recent AI breakthroughs. The fastest algorithms for the problem are new forms of counterfactual regret minimization [Brown and Sandholm, 2019]. In this paper we present a totally different approach to the problem, which is competitive and often orders of magnitude better than the prior state of the art. The equilibrium-finding problem can be formulated as a linear program (LP) [Koller et al., 1994], but solving it as an LP has not been scalable due to the memory requirements of LP solvers, which can often be quadratically worse than CFR-based algorithms. We give an efficient practical algorithm that factors a large payoff matrix into a product of two matrices that are typically dramatically sparser. This allows us to express the equilibrium-finding problem as a linear program with size only a logarithmic factor worse than CFR, and thus allows linear program solvers to run on such games. With experiments on poker endgames, we demonstrate in practice, for the first time, that modern linear program solvers are competitive against even game-specific modern variants of CFR in solving large extensive-form games, and can be used to compute exact solutions unlike iterative algorithms like CFR.
LGNov 15, 2018
A Spectral View of Adversarially Robust FeaturesShivam Garg, Vatsal Sharan, Brian Hu Zhang et al.
Given the apparent difficulty of learning models that are robust to adversarial perturbations, we propose tackling the simpler problem of developing adversarially robust features. Specifically, given a dataset and metric of interest, the goal is to return a function (or multiple functions) that 1) is robust to adversarial perturbations, and 2) has significant variation across the datapoints. We establish strong connections between adversarially robust features and a natural spectral property of the geometry of the dataset and metric of interest. This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset. Finally, we provide empirical evidence that the adversarially robust features given by this spectral approach can be fruitfully leveraged to learn a robust (and accurate) model.
LGJan 22, 2018
Mitigating Unwanted Biases with Adversarial LearningBrian Hu Zhang, Blake Lemoine, Margaret Mitchell
Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictor's ability to predict Y while minimizing the adversary's ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.