LGApr 16

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Come Fiegel, Pierre Menard, Tadashi Kozuno, Michal Valko, Vianney Perchet

arXiv:2604.1524212.11 citationsh-index: 44

Predicted impact top 57% in LG · last 90 daysOriginality Highly original

AI Analysis

Provides the first algorithm to attain the optimal convergence rate for uncoupled players in zero-sum matrix games with bandit feedback, solving an open problem.

The paper achieves optimal last-iterate convergence rate of Õ(t^{-1/4}) in zero-sum matrix games with bandit feedback, matching the lower bound, and extends this result to extensive-form games.

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.

View on arXiv PDF

Similar