LGROFeb 26, 2025

Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data

arXiv:2502.19544v22 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the challenge of data scarcity and inefficiency in RL for robotics and AI applications, representing a strong specific gain rather than a foundational breakthrough.

The paper tackled the problem of improving sample efficiency in online reinforcement learning by leveraging non-curated offline data, achieving a 102.8% relative improvement in aggregate score over baselines across 72 visuomotor tasks.

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two essential techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves a 102.8\% relative improvement in aggregate score over learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes