LG AI NEMar 12, 2020

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat

arXiv:2003.05988v15.08 citations

Originality Synthesis-oriented

AI Analysis

This work addresses hyper-parameter tuning in self-play reinforcement learning, offering practical recommendations for researchers and practitioners, though it is incremental as it builds on existing AlphaZero methods.

The paper investigates the impact of 12 hyper-parameters in an AlphaZero-like self-play algorithm on training performance, using small games to manage computational costs. It finds that training is highly sensitive to these choices and identifies the number of self-play iterations as the most critical factor, subsuming other parameters like MCTS-search simulations, game-episodes, and training epochs, and recommends maximizing self-play iterations while setting inner-loop parameters lower.

The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game-episodes, and training epochs. The intuition is that these three increase together as self-play iterations increase, and that increasing them individually is sub-optimal. A consequence of our experiments is a direct recommendation for setting hyper-parameter values in self-play: the overarching outer-loop of self-play iterations should be maximized, in favor of the three inner-loop hyper-parameters, which should be set at lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.

View on arXiv PDF

Similar