GT LG MAJan 28, 2025

On the Power of Perturbation under Sampling in Solving Extensive-Form Games

Wataru Masaka, Mitsuki Sakamoto, Kenshi Abe, Kaito Ariu, Tuomas Sandholm, Atsushi Iwasaki

arXiv:2501.16600v22.31 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses instability in game-solving algorithms for AI applications like poker, but it is incremental as it builds on existing FTRL methods with specific perturbations.

The paper tackled the problem of stabilizing learning algorithms in imperfect-information extensive-form games under sampling noise, showing that perturbed FTRL variants achieve last-iterate convergence, with PFTRL-RKL outperforming PFTRL-KL in asymmetric games like Leduc poker due to variance reduction.

We investigate how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in solving imperfect-information extensive-form games under sampling, where payoffs are estimated from sampled trajectories. While optimistic algorithms are effective under full feedback, they often become unstable in the presence of sampling noise. Payoff perturbation offers a promising alternative for stabilizing learning and achieving \textit{last-iterate convergence}. We present a unified framework for \textit{Perturbed FTRL} algorithms and study two variants: PFTRL-KL (standard KL divergence) and PFTRL-RKL (Reverse KL divergence), the latter featuring an estimator with both unbiasedness and conditional zero variance. While PFTRL-KL generally achieves equivalent or better performance across benchmark games, PFTRL-RKL consistently outperforms it in Leduc poker, whose structure is more asymmetric than the other games in a sense. Given the modest advantage of PFTRL-RKL, we design the second experiment to isolate the effect of conditional zero variance, showing that the variance-reduction property of RKL improve last-iterate performance.

View on arXiv PDF

Similar