AIFeb 16, 2018

Monte Carlo Q-learning for General Game Playing

arXiv:1802.05944v220 citations
AI Analysis

This is an incremental improvement for researchers in reinforcement learning and game playing, focusing on GGP as a testbed.

The paper tackles the problem of applying reinforcement learning to General Game Playing (GGP) by implementing Q-learning on small-board games and enhancing it with Monte Carlo Search to create QM-learning, which improves performance over pure Q-learning.

After the recent groundbreaking results of AlphaGo, we have seen a strong interest in reinforcement learning in game playing. General Game Playing (GGP) provides a good testbed for reinforcement learning. In GGP, a specification of games rules is given. GGP problems can be solved by reinforcement learning. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee & Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex), to allow comparison to Banerjee et al. As expected, Q-learning converges, although much slower than MCTS. Borrowing an idea from MCTS, we enhance Q-learning with Monte Carlo Search, to give QM-learning. This enhancement improves the performance of pure Q-learning. We believe that QM-learning can also be used to improve performance of reinforcement learning further for larger games, something which we will test in future work.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes