Transformer Based Planning in the Observation Space with Applications to Trick Taking Card Games
This work addresses planning inefficiencies in imperfect information games for AI and game theory applications, representing an incremental improvement over existing methods like PIMC.
The paper tackles the challenge of planning in imperfect information games with large state spaces, such as trick-taking card games, by introducing Generative Observation Monte Carlo Tree Search (GO-MCTS) that uses transformers for generative modeling, achieving promising results in games like Hearts, Skat, and 'The Crew'.
Traditional search algorithms have issues when applied to games of imperfect information where the number of possible underlying states and trajectories are very large. This challenge is particularly evident in trick-taking card games. While state sampling techniques such as Perfect Information Monte Carlo (PIMC) search has shown success in these contexts, they still have major limitations. We present Generative Observation Monte Carlo Tree Search (GO-MCTS), which utilizes MCTS on observation sequences generated by a game specific model. This method performs the search within the observation space and advances the search using a model that depends solely on the agent's observations. Additionally, we demonstrate that transformers are well-suited as the generative model in this context, and we demonstrate a process for iteratively training the transformer via population-based self-play. The efficacy of GO-MCTS is demonstrated in various games of imperfect information, such as Hearts, Skat, and "The Crew: The Quest for Planet Nine," with promising results.