LGAINENov 30, 2018

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

arXiv:1812.00045v16 citations
Originality Incremental advance
AI Analysis

This work addresses training inefficiencies in DRL for game environments, though it is incremental as it builds on existing A3C methods.

The paper tackles slow convergence and local optima in deep reinforcement learning by augmenting A3C with a terminal prediction auxiliary task and integrating planning algorithms like Monte Carlo tree search as demonstrators, resulting in faster learning and better policies on a mini Pommerman game.

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes