AILGOCMay 15, 2018

Feedback-Based Tree Search for Reinforcement Learning

arXiv:1805.05935v130 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of scaling tree search methods in reinforcement learning for complex domains like video games, though it is incremental as it builds on existing MCTS approaches.

The paper tackles the problem of applying Monte-Carlo tree search (MCTS) to reinforcement learning by proposing a model-based technique that iteratively refines value and policy estimates through feedback, achieving competitive performance in a MOBA game and providing the first sample complexity bounds for such methods.

Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a model-based reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes