LGAIMLJun 19, 2019

When to Trust Your Model: Model-Based Policy Optimization

arXiv:1906.08253v31221 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of sample efficiency and scalability in reinforcement learning for researchers and practitioners, offering an incremental improvement over prior model-based methods.

The paper tackles the challenge of balancing data generation ease with model bias in model-based reinforcement learning by introducing a method that uses short model-generated rollouts from real data, achieving improved sample efficiency, matching asymptotic performance of model-free algorithms, and scaling to longer horizons.

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

Code Implementations11 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes