ROMay 29

RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

Zijian Zhu, Menglin Zou, Zhuang Li, Yaojie Tu, Xinhai Sun

arXiv:2605.3095771.4h-index: 3

Predicted impact top 24% in RO · last 90 daysOriginality Highly original

AI Analysis

RDGen provides a scalable and cost-effective method for generating high-quality robot demonstration data, which is a critical bottleneck for improving the performance of VLA models in general-purpose robot control.

This paper introduces RDGen, a sim-to-real reinforcement learning framework designed to generate high-quality robot demonstrations. It addresses the scarcity of such data for Vision-Language-Action (VLA) models by using trained RL policies as a trajectory generator, achieving a high task success rate on a pick-and-place task and producing smoother trajectories than human teleoperation, leading to superior downstream VLA performance.

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robot control. However, their performance remains fundamentally constrained by the availability of high-quality robot trajectory data. In current robot learning practice, such data are primarily collected through human teleoperation, which is labor-intensive, costly, and difficult to scale. In this paper, we propose RDGen, a sim-to-real reinforcement learning framework for generating high-quality robot demonstrations. Rather than employing reinforcement learning solely as the final control policy, RDGen leverages trained RL policies as a structured trajectory generator. The system consists of a VLM-based task parser that identifies task-relevant objects, a Grounding DINO-based object localizer, and an RL policy transferred from simulation to the real robot. Successful rollouts are then harvested as clean, high-quality demonstrations for downstream VLA training, while the simulation stage further provides a scalable source of additional trajectories at little marginal cost. Experiments on a pick-and-place task demonstrate that the transferred RL policy achieves a high task success rate. Compared with human teleoperation, RDGen produces significantly smoother trajectories and yields superior downstream VLA performance. These results indicate that RL-generated demonstrations can serve as more reliable and consistent supervisory signals for robot policy learning.

View on arXiv PDF

Similar