ROApr 26

Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation

Haoming Xu, Lei Lei, Jie Gu, Chu Tang, Jingmin Chen, Ruiqi Wang

arXiv:2604.2362084.3

Predicted impact top 14% in RO · last 90 daysOriginality Incremental advance

AI Analysis

For robotic manipulation, this work introduces a phase-disentangled architecture that improves efficiency and precision, though it is an incremental improvement over existing methods.

Move-Then-Operate decouples robotic manipulation into coarse relocation and contact-critical interaction phases, achieving 68.9% success rate on RoboTwin2, outperforming monolithic baseline by 24% and matching models trained on 10x more data with 40% fewer training steps.

We present Move-Then-Operate, a Vision language action framework that explicitly decouples robotic manipulation into two distinct behavioral phases: coarse relocation (move) and contact-critical interaction (operate). Unlike monolithic policies that conflate these heterogeneous regimes, our architecture employs a dual-expert policy routed by a learnable phase selector, introducing a structural inductive bias that isolates phase-specific dynamics. Phase labels are automatically generated via an MLLM-based pipeline conditioned on lightweight contextual cues such as end-effector velocity and subtask decomposition to ensure alignment with human motor patterns. Evaluated on the RoboTwin2 benchmark, our method achieves an average success rate of $68.9\%$, outperforming the monolithic $π_0$ baseline by $24\%$. It matches or exceeds models trained on $10\times$ more data and reaches peak performance in $40\%$ fewer training steps, demonstrating that architectural disentanglement of move and operate phases is a highly effective and efficient strategy for mastering high-precision manipulation.

View on arXiv PDF

Similar