ROMay 29

CLAW: A Vision-Language-Action Framework for Weight-Aware Robotic Grasping

arXiv:2509.141439.33 citationsh-index: 3
Predicted impact top 39% in RO · last 90 daysOriginality Incremental advance
AI Analysis

For robotic manipulation tasks requiring precise weight thresholds, CLAW provides a modular solution to integrate symbolic reasoning with visuomotor control, though the approach is incremental.

CLAW introduces a framework that decouples condition evaluation from action generation for weight-aware robotic grasping, using a fine-tuned CLIP model to monitor scale readouts and generate prompts for a VLA policy. It outperforms baseline π0 models in single-object and dual-arm mixed-object tasks.

Vision-language-action (VLA) models have recently emerged as a promising paradigm for robotic control, enabling end-to-end policies that ground natural language instructions into visuomotor actions. However, current VLAs often struggle to satisfy precise task constraints, such as stopping based on numeric thresholds, since their observation-to-action mappings are implicitly shaped by training data and lack explicit mechanisms for condition monitoring. In this work, we propose CLAW (CLIP-Language-Action for Weight), a framework that decouples condition evaluation from action generation. CLAW leverages a fine-tuned CLIP model as a lightweight prompt generator, which continuously monitors the digital readout of a scale and produces discrete directives based on task-specific weight thresholds. These prompts are then consumed by $π_0$, a flow-based VLA policy, which integrates the prompts with multi-view camera observations to produce continuous robot actions. This design enables CLAW to combine symbolic weight reasoning with high-frequency visuomotor control. We validate CLAW on three experimental setups: single-object grasping and mixed-object tasks requiring dual-arm manipulation. Across all conditions, CLAW reliably executes weight-aware behaviors and outperforms both raw-$π_0$ and fine-tuned $π_0$ models. A video of our paper is available online https://youtu.be/MuMYj2QgReI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes