LGGTJun 17, 2025

Two-Player Zero-Sum Games with Bandit Feedback

arXiv:2506.14518v22 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a specific theoretical gap in game theory for researchers, though it appears incremental as it adapts existing frameworks to a new setting.

The authors tackled the problem of learning pure strategy Nash Equilibria in two-player zero-sum games with bandit feedback, proposing three Explore-Then-Commit-based algorithms that achieve instance-dependent regret bounds of O(Δ + √T) and O(log(TΔ²)/Δ).

We study a two-player zero-sum game in which the row player aims to maximize their payoff against an adversarial column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the Explore-Then-Commit framework. The first adapts it to zero-sum games, the second incorporates adaptive elimination that leverages the $\varepsilon$-Nash Equilibrium property to efficiently select the optimal action pair, and the third extends the elimination algorithm by employing non-uniform exploration. Our objective is to demonstrate the applicability of ETC in a zero-sum game setting by focusing on learning pure strategy Nash Equilibria. A key contribution of our work is a derivation of instance-dependent upper bounds on the expected regret of our proposed algorithms, which has received limited attention in the literature on zero-sum games. Particularly, after $T$ rounds, we achieve an instance-dependent regret upper bounds of $O(Δ+ \sqrt{T})$ for ETC in zero-sum game setting and $O(\log (T Δ^2) / Δ)$ for the adaptive elimination algorithm and its variant with non-uniform exploration, where $Δ$ denotes the suboptimality gap. Therefore, our results indicate that ETC-based algorithms perform effectively in adversarial game settings, achieving regret bounds comparable to existing methods while providing insight through instance-dependent analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes