ROAILGMay 19, 2025

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

arXiv:2505.12705v287 citationsh-index: 33Has Code
Originality Highly original
AI Analysis

This addresses the problem of scaling robot learning beyond manual data collection for robotics researchers and practitioners, representing a novel approach rather than an incremental improvement.

The researchers tackled the problem of robot policy generalization across behaviors and environments by introducing DreamGen, a pipeline that uses video world models to generate synthetic robot data. The result was a humanoid robot performing 22 new behaviors in both seen and unseen environments, requiring teleoperation data from only a single pick-and-place task in one environment.

We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot data generated from video world models. DreamGen leverages state-of-the-art image-to-video generative models, adapting them to the target robot embodiment to produce photorealistic synthetic videos of familiar or novel tasks in diverse environments. Since these models generate only videos, we recover pseudo-action sequences using either a latent action model or an inverse-dynamics model (IDM). Despite its simplicity, DreamGen unlocks strong behavior and environment generalization: a humanoid robot can perform 22 new behaviors in both seen and unseen environments, while requiring teleoperation data from only a single pick-and-place task in one environment. To evaluate the pipeline systematically, we introduce DreamGen Bench, a video generation benchmark that shows a strong correlation between benchmark performance and downstream policy success. Our work establishes a promising new axis for scaling robot learning well beyond manual data collection. Code available at https://github.com/NVIDIA/GR00T-Dreams.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes