LGJan 28, 2025

On Rollouts in Model-Based Reinforcement Learning

arXiv:2501.16918v29 citationsh-index: 20ICLR
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in model-based reinforcement learning for improving data efficiency and planning, though it is incremental as it builds on existing Dyna-style methods.

The paper tackles the problem of accumulated model errors in model-based reinforcement learning, which distorts data distribution and hinders long-term planning, by proposing Infoprop, a rollout mechanism that separates aleatoric and epistemic uncertainty and reduces epistemic influence, resulting in state-of-the-art performance on MuJoCo benchmarks with increased rollout length and data quality.

Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes