AIDec 5, 2023

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

Youpeng Zhao, Yudong Lu, Jian Zhao, Wengang Zhou, Houqiang Li

arXiv:2312.02561v110.06 citationsh-index: 67Has CodeIEEE Trans Game

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of developing AI for a popular but intricate four-player card game, which is incremental as it applies existing reinforcement learning methods to a new domain.

The researchers tackled the challenge of creating an AI for the complex card game GuanDan, using reinforcement learning techniques like Deep Monte Carlo and policy-based methods, resulting in an AI program that outperformed baseline heuristic-based bots with superior performance.

The utilization of artificial intelligence (AI) in card games has been a well-explored subject within AI research for an extensive period. Recent advancements have propelled AI programs to showcase expertise in intricate card games such as Mahjong, DouDizhu, and Texas Hold'em. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called GuanDan. This game involves four players engaging in both competitive and cooperative play throughout a long process to upgrade their level, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically Deep Monte Carlo (DMC), and a distributed training framework, we first put forward an AI program named DanZero for this game. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pre-trained model to facilitate the training process and the achieved AI program manages to achieve a superior performance.

View on arXiv PDF Code

Similar