GTAILGMAJan 22, 2019

Single Deep Counterfactual Regret Minimization

arXiv:1901.07621v443 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving computational efficiency and reducing expert dependency in game-solving algorithms for researchers and practitioners in AI and game theory, though it is incremental as it builds on Deep CFR.

The paper tackles the scalability and expert-knowledge limitations of Counterfactual Regret Minimization (CFR) in imperfect information games by introducing Single Deep CFR (SD-CFR), a simplified variant that avoids training an average strategy network, resulting in lower approximation error and empirically outperforming Deep CFR in poker with respect to exploitability and one-on-one play.

Counterfactual Regret Minimization (CFR) is the most successful algorithm for finding approximate Nash equilibria in imperfect information games. However, CFR's reliance on full game-tree traversals limits its scalability. For this reason, the game's state- and action-space is often abstracted (i.e. simplified) for CFR, and the resulting strategy is then translated back to the full game, which requires extensive expert-knowledge and often converges to highly exploitable policies. A recently proposed method, Deep CFR, applies deep learning directly to CFR, allowing the agent to intrinsically abstract and generalize over the state-space from samples, without requiring expert knowledge. In this paper, we introduce Single Deep CFR (SD-CFR), a simplified variant of Deep CFR that has a lower overall approximation error by avoiding the training of an average strategy network. We show that SD-CFR is more attractive from a theoretical perspective and empirically outperforms Deep CFR with respect to exploitability and one-on-one play in poker.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes