ROLGNov 7, 2025

TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

arXiv:2511.05275v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of data inefficiency in bimanual robotic manipulation for researchers and practitioners, offering a scalable approach that leverages public single-arm datasets, though it is incremental in building upon existing single-arm models.

The paper tackles the challenge of adapting vision-language-action models for bimanual manipulation tasks without requiring extensive bimanual data, by introducing TwinVLA, a modular framework that composes two pretrained single-arm models; it outperforms a comparably-sized monolithic model and narrows the gap to state-of-the-art models that use proprietary data.

Vision-language-action models (VLAs) trained on large-scale robotic datasets have demonstrated strong performance on manipulation tasks, including bimanual tasks. However, because most public datasets focus on single-arm demonstrations, adapting VLAs for bimanual tasks typically requires substantial additional bimanual data and fine-tuning. To address this challenge, we introduce TwinVLA, a modular framework that composes two copies of a pretrained single-arm VLA into a coordinated bimanual VLA. Unlike monolithic cross-embodiment models trained on mixtures of single-arm and bimanual data, TwinVLA improves both data efficiency and performance by composing pretrained single-arm policies. Across diverse bimanual tasks in real-world and simulation settings, TwinVLA outperforms a comparably-sized monolithic RDT-1B model without requiring any bimanual pretraining. Furthermore, it narrows the gap to state-of-the-art model, $π_0$ which rely on extensive proprietary bimanual data and compute cost. These results establish our modular composition approach as a data-efficient and scalable path toward high-performance bimanual manipulation, leveraging public single-arm data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes