Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information
This work addresses the problem of scaling reinforcement learning to multi-agent games with complex action spaces for AI gaming applications, but it is incremental as it applies an existing method to a new environment.
The researchers tackled the challenge of applying reinforcement learning to a complex four-player card game with imperfect information, achieving a level that outperforms amateur human players after short training without tree search.
We introduce a new virtual environment for simulating a card game known as "Big 2". This is a four-player game of imperfect information with a relatively complicated action space (being allowed to play 1,2,3,4 or 5 card combinations from an initial starting hand of 13 cards). As such it poses a challenge for many current reinforcement learning methods. We then use the recently proposed "Proximal Policy Optimization" algorithm to train a deep neural network to play the game, purely learning via self-play, and find that it is able to reach a level which outperforms amateur human players after only a relatively short amount of training time and without needing to search a tree of future game states.