LGJul 15, 2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

arXiv:2107.07410v124 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the exploration bottleneck in model-based RL for control tasks, offering a computationally efficient solution with broad applicability, though it builds incrementally on prior model-based methods.

The paper tackles the exploration problem in model-based reinforcement learning by introducing PC-MLP, an algorithm that guarantees polynomial sample complexity for kernelized nonlinear regulators and linear MDPs, and demonstrates its efficacy on challenging control tasks where existing methods fail, achieving strong performance in both exploration-heavy and dense reward benchmarks.

Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL. However, existing empirical model-based RL approaches lack the ability to explore. This work studies a computationally and statistically efficient model-based algorithm for both Kernelized Nonlinear Regulators (KNR) and linear Markov Decision Processes (MDPs). For both models, our algorithm guarantees polynomial sample complexity and only uses access to a planning oracle. Experimentally, we first demonstrate the flexibility and efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. Finally, we demonstrate that our method can also perform reward-free exploration efficiently. Our code can be found at https://github.com/yudasong/PCMLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes