RO CVNov 22, 2025

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Ting Huang, Dongjian Li, Rui Yang, Zeyu Zhang, Zida Yang, Hao Tang

arXiv:2511.17889v111 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a fundamental problem in robotics for enabling more stable and generalizable control from language instructions, though it appears incremental with a modest performance improvement.

The paper tackles the challenge of grounding natural-language instructions into continuous control for quadruped robots by introducing MobileVLA-R1, a unified vision-language-action framework that improves reasoning and control, achieving approximately a 5% performance gain in evaluations.

Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive evaluations on VLN and VLA tasks demonstrate superior performance over strong baselines, with approximately a 5% improvement. Real-world deployment on a quadruped robot validates robust performance in complex environments. Code: https://github.com/AIGeeksGroup/MobileVLA-R1. Website: https://aigeeksgroup.github.io/MobileVLA-R1.

View on arXiv PDF Code

Similar