LGAIMLDec 23, 2019

Monte-Carlo Tree Search for Policy Optimization

arXiv:1912.10648v13 citations
Originality Incremental advance
AI Analysis

This addresses the problem of local optima and poor exploration in reinforcement learning for researchers, though it is incremental as it builds on existing gradient-free and tree search techniques.

The paper tackles policy optimization in deep reinforcement learning by introducing MCTSPO, a method combining Monte-Carlo tree search with gradient-free optimization, which improves performance on tasks with deceptive or sparse rewards compared to baselines.

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes