AIDec 11, 2019

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

arXiv:1912.05407v12.02 citationsHas Code

Originality Incremental advance

AI Analysis

This is an incremental improvement for designing Gomoku AI and similar board games, addressing specific bottlenecks in MCTS performance.

The paper tackled the problem of improving Monte Carlo Tree Search (MCTS) for Gomoku by combining Adaptive Dynamic Programming (ADP) with UCT, using a heuristic and pruning strategies to eliminate search depth defects and accelerate convergence. The result showed that this method converges to correct values faster than single UCT, though no specific numerical gains were provided.

We combine Adaptive Dynamic Programming (ADP), a reinforcement learning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku. For the Adaptive Dynamic Programming part, we train a shallow forward neural network to give a quick evaluation of Gomoku board situations. UCT is a general approach in MCTS as a tree policy. Our framework use UCT to balance the exploration and exploitation of Gomoku game trees while we also apply powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes. Experiment result shows that this method can eliminate the search depth defect of the simulation process and converge to the correct value faster than single UCT. This approach can be applied to design new Gomoku AI and solve other Gomoku-like board game.

View on arXiv PDF Code

Similar