ROAICLCVMay 14

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

arXiv:2605.1471297.1
Predicted impact top 4% in RO · last 90 daysOriginality Incremental advance
AI Analysis

For robot learning practitioners, IntentVLA solves the aliasing problem in multimodal imitation data, improving execution stability in manipulation tasks.

IntentVLA addresses inter-chunk conflict in robot manipulation under partial observability by encoding recent visual observations into a short-horizon intent representation. It achieves improved rollout stability and outperforms strong VLA baselines across AliasBench, SimplerEnv, LIBERO, and RoboCasa.

Robot imitation data are often multimodal: similar visual-language observations may be followed by different action chunks because human demonstrators act with different short-horizon intents, task phases, or recent context. Existing frame-conditioned VLA policies infer each chunk from the current observation and instruction alone, so under partial observability they may resample different intents across adjacent replanning steps, leading to inter-chunk conflict and unstable execution. We introduce IntentVLA, a history-conditioned VLA framework that encodes recent visual observations into a compact short-horizon intent representation and uses it to condition chunk generation. We further introduce AliasBench, a 12-task ambiguity-aware benchmark on RoboTwin2 with matched training data and evaluation environments that isolate short-horizon observation aliasing. Across AliasBench, SimplerEnv, LIBERO, and RoboCasa, IntentVLA improves rollout stability and outperforms strong VLA baselines

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes