ROMay 29

RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

arXiv:2605.3095771.4h-index: 3
Predicted impact top 24% in RO · last 90 daysOriginality Highly original
AI Analysis

RDGen provides a scalable and cost-effective method for generating high-quality robot demonstration data, which is a critical bottleneck for improving the performance of VLA models in general-purpose robot control.

This paper introduces RDGen, a sim-to-real reinforcement learning framework designed to generate high-quality robot demonstrations. It addresses the scarcity of such data for Vision-Language-Action (VLA) models by using trained RL policies as a trajectory generator, achieving a high task success rate on a pick-and-place task and producing smoother trajectories than human teleoperation, leading to superior downstream VLA performance.

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robot control. However, their performance remains fundamentally constrained by the availability of high-quality robot trajectory data. In current robot learning practice, such data are primarily collected through human teleoperation, which is labor-intensive, costly, and difficult to scale. In this paper, we propose RDGen, a sim-to-real reinforcement learning framework for generating high-quality robot demonstrations. Rather than employing reinforcement learning solely as the final control policy, RDGen leverages trained RL policies as a structured trajectory generator. The system consists of a VLM-based task parser that identifies task-relevant objects, a Grounding DINO-based object localizer, and an RL policy transferred from simulation to the real robot. Successful rollouts are then harvested as clean, high-quality demonstrations for downstream VLA training, while the simulation stage further provides a scalable source of additional trajectories at little marginal cost. Experiments on a pick-and-place task demonstrate that the transferred RL policy achieves a high task success rate. Compared with human teleoperation, RDGen produces significantly smoother trajectories and yields superior downstream VLA performance. These results indicate that RL-generated demonstrations can serve as more reliable and consistent supervisory signals for robot policy learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes