RO CVAug 30, 2025

Galaxea Open-World Dataset and G0 Dual-System VLA Model

Tao Jiang, Tianyuan Yuan, Yicheng Liu, Chenhao Lu, Jianning Cui, Xiao Liu, Shuiqi Cheng, Jiyang Gao, Huazhe Xu, Hang Zhao

arXiv:2509.00576v132.059 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of robot learning in authentic human environments, providing a dataset and model that could advance robotics, though it appears incremental by building on existing VLM and VLA methods.

The paper tackles the challenge of enabling robots to perform diverse tasks in real-world environments by introducing the Galaxea Open-World Dataset, a large-scale collection of robot behaviors with language annotations, and the G0 dual-system VLA model, which achieves strong performance in benchmarks including tabletop manipulation and long-horizon mobile manipulation.

We present Galaxea Open-World Dataset, a large-scale, diverse collection of robot behaviors recorded in authentic human living and working environments. All demonstrations are gathered using a consistent robotic embodiment, paired with precise subtask-level language annotations to facilitate both training and evaluation. Building on this dataset, we introduce G0, a dual-system framework that couples a Vision-Language Model (VLM) for multimodal planning with a Vision-Language-Action (VLA) model for fine-grained execution. G0 is trained using a three-stage curriculum: cross-embodiment pre-training, single-embodiment pre-training, and task-specific post-training. A comprehensive benchmark spanning tabletop manipulation, few-shot learning, and long-horizon mobile manipulation, demonstrates the effectiveness of our approach. In particular, we find that the single-embodiment pre-training stage, together with the Galaxea Open-World Dataset, plays a critical role in achieving strong performance.

View on arXiv PDF

Similar