LGFeb 13, 2025

Reevaluating Policy Gradient Methods for Imperfect-Information Games

arXiv:2502.08938v26 citationsh-index: 71Has Code
Originality Incremental advance
AI Analysis

This research addresses the problem of imperfect-information games for the machine learning and game theory communities, providing an incremental understanding of the effectiveness of different DRL algorithms.

The authors tackled the problem of imperfect-information games and found that generic policy gradient methods like PPO are competitive with or superior to FP-, DO-, and CFR-based DRL approaches, with results based on over 5600 training runs. The FP-, DO-, and CFR-based approaches failed to outperform generic policy gradient methods.

In the past decade, motivated by the putative failure of naive self-play deep reinforcement learning (DRL) in adversarial imperfect-information games, researchers have developed numerous DRL algorithms based on fictitious play (FP), double oracle (DO), and counterfactual regret minimization (CFR). In light of recent results of the magnetic mirror descent algorithm, we hypothesize that simpler generic policy gradient methods like PPO are competitive with or superior to these FP-, DO-, and CFR-based DRL approaches. To facilitate the resolution of this hypothesis, we implement and release the first broadly accessible exact exploitability computations for four large games. Using these games, we conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games. Over 5600 training runs, we find that FP-, DO-, and CFR-based approaches fail to outperform generic policy gradient methods. Code is available at https://github.com/nathanlct/IIG-RL-Benchmark and https://github.com/gabrfarina/exp-a-spiel .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes