ROAILGOct 20, 2025

Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots

arXiv:2510.17369v11 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the need for safe and adaptable robotic systems in human-centered environments, though it is incremental as it applies existing models to a new domain.

The paper tackled the problem of deploying Vision-Language-Action models on soft robots to enable safe human-robot interaction, showing that targeted finetuning allows soft robots to perform equally to rigid counterparts in manipulation tasks.

Robotic systems are increasingly expected to operate in human-centered, unstructured environments where safety, adaptability, and generalization are essential. Vision-Language-Action (VLA) models have been proposed as a language guided generalized control framework for real robots. However, their deployment has been limited to conventional serial link manipulators. Coupled by their rigidity and unpredictability of learning based control, the ability to safely interact with the environment is missing yet critical. In this work, we present the deployment of a VLA model on a soft continuum manipulator to demonstrate autonomous safe human-robot interaction. We present a structured finetuning and deployment pipeline evaluating two state-of-the-art VLA models (OpenVLA-OFT and $π_0$) across representative manipulation tasks, and show while out-of-the-box policies fail due to embodiment mismatch, through targeted finetuning the soft robot performs equally to the rigid counterpart. Our findings highlight the necessity of finetuning for bridging embodiment gaps, and demonstrate that coupling VLA models with soft robots enables safe and flexible embodied AI in human-shared environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes