ROAICVJul 21, 2025

GR-3 Technical Report

arXiv:2507.15493v293 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating versatile robots for assisting humans in daily life, representing an incremental advancement with strong specific gains.

The paper tackles the problem of building generalist robot policies by developing GR-3, a large-scale vision-language-action model that generalizes to novel objects, environments, and abstract instructions, and surpasses the state-of-the-art baseline method $π_0$ on a wide variety of challenging tasks.

We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model. It showcases exceptional capabilities in generalizing to novel objects, environments, and instructions involving abstract concepts. Furthermore, it can be efficiently fine-tuned with minimal human trajectory data, enabling rapid and cost-effective adaptation to new settings. GR-3 also excels in handling long-horizon and dexterous tasks, including those requiring bi-manual manipulation and mobile movement, showcasing robust and reliable performance. These capabilities are achieved through a multi-faceted training recipe that includes co-training with web-scale vision-language data, efficient fine-tuning from human trajectory data collected via VR devices, and effective imitation learning with robot trajectory data. In addition, we introduce ByteMini, a versatile bi-manual mobile robot designed with exceptional flexibility and reliability, capable of accomplishing a wide range of tasks when integrated with GR-3. Through extensive real-world experiments, we show GR-3 surpasses the state-of-the-art baseline method, $π_0$, on a wide variety of challenging tasks. We hope GR-3 can serve as a step towards building generalist robots capable of assisting humans in daily life.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes