ROJun 4

Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Jiakai Zhou, Weinan Zhang, Chao Yu, Yu Wang

arXiv:2602.1262881.3h-index: 9

AI Analysis

For robotics researchers deploying VLA models, this work provides a practical method to leverage simulation for improving real-world performance and data efficiency, though it is an incremental improvement over existing co-training approaches.

The paper introduces an RL-based sim-real co-training framework (RL-Co) for VLA models that uses interactive simulation with reinforcement learning and an auxiliary supervised loss on real data to prevent catastrophic forgetting. On four real-world tabletop tasks, RL-Co achieves +24% and +20% success rate improvements over baselines for OpenVLA and π0.5, respectively, and improves generalization and data efficiency.

Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an RL-based sim-real Co-training (RL-Co) framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and $π_{0.5}$, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on $π_{0.5}$. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.

View on arXiv PDF

Similar