ROAIMay 3

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

arXiv:2605.0203761.6
AI Analysis

This work provides a practical, low-cost platform for researchers to deploy VLA policies on real robots, but the results are incremental as it applies existing models to a new hardware setup.

VILAS is a low-cost robotic manipulation platform integrating a collaborative arm, gripper, and cameras for VLA policy learning. It uses a kirigami-based soft gripper for safe grasping, and fine-tuned three VLA models (pi_0, pi_0.5, GR00T N1.6) on a grape grasping task, demonstrating successful policy deployment on accessible hardware.

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes