LG AI MLJun 19, 2019

When to Trust Your Model: Model-Based Policy Optimization

Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine

arXiv:1906.08253v346.71226 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of sample efficiency and scalability in reinforcement learning for researchers and practitioners, offering an incremental improvement over prior model-based methods.

The paper tackles the challenge of balancing data generation ease with model bias in model-based reinforcement learning by introducing a method that uses short model-generated rollouts from real data, achieving improved sample efficiency, matching asymptotic performance of model-free algorithms, and scaling to longer horizons.

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

View on arXiv PDF Code

Similar