ROCVDec 12, 2025

AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis

Georgia Tech
arXiv:2512.11797v16 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the problem of costly and limited robot data acquisition for imitation learning researchers and practitioners, offering a practical scaling solution with incremental improvements over existing generative methods.

The paper tackles the bottleneck of collecting diverse robot demonstrations for imitation learning by introducing AnchorDream, an embodiment-aware world model that repurposes video diffusion models to synthesize robot data from few demonstrations, achieving relative gains of 36.4% in simulator benchmarks and nearly double performance in real-world studies.

The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and simulators offer limited diversity and fidelity with pronounced sim-to-real gaps. While generative models present an attractive solution, existing methods often alter only visual appearances without creating new behaviors, or suffer from embodiment inconsistencies that yield implausible motions. To address these limitations, we introduce AnchorDream, an embodiment-aware world model that repurposes pretrained video diffusion models for robot data synthesis. AnchorDream conditions the diffusion process on robot motion renderings, anchoring the embodiment to prevent hallucination while synthesizing objects and environments consistent with the robot's kinematics. Starting from only a handful of human teleoperation demonstrations, our method scales them into large, diverse, high-quality datasets without requiring explicit environment modeling. Experiments show that the generated data leads to consistent improvements in downstream policy learning, with relative gains of 36.4% in simulator benchmarks and nearly double performance in real-world studies. These results suggest that grounding generative world models in robot motion provides a practical path toward scaling imitation learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes