Automatically Reinforcing a Game AI
This work addresses the challenge of enhancing AI performance in games, particularly against adaptive opponents, but it is incremental as it builds on existing portfolio methods.
The paper tackles the problem of improving game playing programs (GPPs) by using portfolio methods, where multiple programs are combined or a single GPP is decomposed into variants via parameters or random seeds. The result includes two offline approaches: BestArm, which performs well against the original GPP but poorly against learning opponents, and Nash-portfolio, which is more robust in such scenarios, along with an online learning portfolio using bandit algorithms.
A recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this is termed portfolio methods. We here investigate the application of such methods to Game Playing Programs (GPPs). In addition, we consider the case in which only one GPP is available - by decomposing this single GPP into several ones through the use of parameters or even simply random seeds. These portfolio methods are trained in a learning phase. We propose two different offline approaches. The simplest one, BestArm, is a straightforward optimization of seeds or parame- ters; it performs quite well against the original GPP, but performs poorly against an opponent which repeats games and learns. The second one, namely Nash-portfolio, performs similarly in a "one game" test, and is much more robust against an opponent who learns. We also propose an online learning portfolio, which tests several of the GPP repeatedly and progressively switches to the best one - using a bandit algorithm.