LGAIMLOct 12, 2020

Local Search for Policy Iteration in Continuous Control

arXiv:2010.05545v117 citations
Originality Incremental advance
AI Analysis

This work addresses data efficiency and computational challenges in reinforcement learning for continuous control, offering incremental improvements by unifying model-based and model-free approaches.

The paper tackles the problem of improving data efficiency and computational performance in reinforcement learning for continuous control by introducing a unified algorithm for local, regularized policy improvement that integrates model-based and model-free variants. The result shows improved data efficiency on benchmarks and significant wall-clock time gains in high-dimensional domains.

We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial. Quantitatively, our algorithm improves data efficiency on several continuous control benchmarks (when a model is learned in parallel), and it provides significant improvements in wall-clock time in high-dimensional domains (when a ground truth model is available). The unified framework also helps us to better understand the space of model-based and model-free algorithms. In particular, we demonstrate that some benefits attributed to model-based RL can be obtained without a model, simply by utilizing more computation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes