ROCVMay 28, 2025

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

arXiv:2505.22159v366 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of dexterous manipulation for robotics, but it is incremental as it builds on existing VLA models by adding force sensing.

The paper tackled the problem of contact-rich robotic manipulation where Vision-Language-Action models struggle with fine-grained force control, and improved average task success by 23.2% over baselines, achieving up to 80% success in tasks like plug insertion.

Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations. However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose ForceVLA, a novel end-to-end manipulation framework that treats external force sensing as a first-class modality within VLA systems. ForceVLA introduces FVLMoE, a force-aware Mixture-of-Experts fusion module that dynamically integrates pretrained visual-language embeddings with real-time 6-axis force feedback during action decoding. This enables context-aware routing across modality-specific experts, enhancing the robot's ability to adapt to subtle contact dynamics. We also introduce \textbf{ForceVLA-Data}, a new dataset comprising synchronized vision, proprioception, and force-torque signals across five contact-rich manipulation tasks. ForceVLA improves average task success by 23.2% over strong pi_0-based baselines, achieving up to 80% success in tasks such as plug insertion. Our approach highlights the importance of multimodal integration for dexterous manipulation and sets a new benchmark for physically intelligent robotic control. Code and data will be released at https://sites.google.com/view/forcevla2025.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes