AIDec 19, 2020

Minimax Strikes Back

arXiv:2012.10700v20.0020 citations

AI Analysis70

This work provides a more efficient approach to learning in complete information games for AI researchers and developers, offering a substantial speedup over existing methods.

This paper introduces Athénan, a Minimax-based algorithm for complete information games that does not use a policy. It demonstrates significantly higher efficiency than AlphaZero's reimplementation, Polygames, being competitive even when Polygames uses 100 times more GPU resources and reducing data generation cost by approximately 296 times.

Deep Reinforcement Learning reaches a superhuman level of play in many complete information games. The state of the art algorithm for learning with zero knowledge is AlphaZero. We take another approach, Athénan, which uses a different, Minimax-based, search algorithm called Descent, as well as different learning targets and that does not use a policy. We show that for multiple games it is much more efficient than the reimplementation of AlphaZero: Polygames. It is even competitive with Polygames when Polygames uses 100 times more GPU (at least for some games). One of the keys to the superior performance is that the cost of generating state data for training is approximately 296 times lower with Athénan. With the same reasonable ressources, Athénan without reinforcement heuristic is at least 7 times faster than Polygames and much more than 30 times faster with reinforcement heuristic.

View on arXiv PDF

Similar