ROAIMay 29

Completion at the Boundary (CaB): Deployable Switching with Completion-Aware Control under Limited Calibration

arXiv:2606.0014534.2h-index: 6
Predicted impact top 61% in RO · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the underexplored problem of task completion detection in deployed VLA agents, which is critical for reliable execution of composite instructions.

CaB introduces a completion-aware switching mechanism for VLA agents that predicts boundary-phase tokens (Before/Hit/After) to improve composite task execution and handoff quality under a low-calibration regime, achieving better results than baselines on a Minecraft benchmark.

Vision-language-action (VLA) agents can execute natural-language instructions, yet deployed systems still lack an operational interface: deciding when the instruction is complete. This gap is acute in short composites ("do A, then B"), where mistimed handoffs cascade into downstream failures. Completion is inherently closed-loop because switching is an intervention that changes the instruction context and thus future actions and observations. We study completion under a deployable low-calibration regime motivated by open-ended instruction spaces, enforcing no test-time relearning and a single globally calibrated switching rule selected once on development set and reused unchanged on test set. Under this constraint, collapsing asymmetric boundary evidence into a single scalar can be brittle under polarity shifts across tasks. We propose Completion at the Boundary (CaB), which predicts an event-local completion object in the form of Boundary-Phase Tokens (Before/Hit/After), retaining two-sided boundary evidence under this discipline. CaB-When converts this completion object into a minimal, auditable switching decision (when), while CaB-How reuses the same completion object to condition action generation for boundary-stable control through handoffs (how). Using an intervention-aware E1/E2 protocol, we show that CaB improves composite execution and handoff quality on a first-person Minecraft VLA benchmark under matched capacity and deployability constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes