ROMar 25

Event-Driven Proactive Assistive Manipulation with Grounded Vision-Language Planning

arXiv:2603.2395076.9h-index: 3
AI Analysis

This addresses the challenge of making robot assistance more fluent and intuitive in human-robot collaboration, though it is incremental as it builds on existing vision-language planning methods.

The paper tackles the problem of enabling robots to provide proactive assistance in collaborative manipulation by shifting from request-driven to event-driven assistance, where robot actions are initiated based on observed state transitions from human-object interactions, and demonstrates improved proactive completion on solvable scenes and appropriate waiting on unsolvable ones in a real tabletop task.

Assistance in collaborative manipulation is often initiated by user instructions, making high-level reasoning request-driven. In fluent human teamwork, however, partners often infer the next helpful step from the observed outcome of an action rather than waiting for instructions. Motivated by this, we introduce a shift from request-driven assistance to event-driven proactive assistance, where robot actions are initiated by workspace state transitions induced by human--object interactions rather than user-provided task instructions. To this end, we propose an event-driven framework that tracks interaction progress with an event monitor and, upon event completion, extracts stabilized pre/post snapshots that characterize the resulting state transition. Given the stabilized snapshots, the planner analyzes the implied state transition to infer a task-level goal and decide whether to intervene; if so, it generates a sequence of assistive actions. To make outputs executable and verifiable, we restrict actions to a set of action primitives and reference objects via integer IDs. We evaluate the framework on a real tabletop number-block collaboration task, demonstrating that explicit pre/post state-change evidence improves proactive completion on solvable scenes and appropriate waiting on unsolvable ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes