A Generative System for Robot-to-Human Handovers: from Intent Inference to Spatial Configuration Imagery
This addresses the problem of natural human-robot collaboration in handover tasks, though it appears incremental by combining existing techniques in a new way.
The paper tackles robot-to-human object handover by inferring human intent from multimodal cues and generating spatial configurations with a diffusion model, achieving fluent and human-like handovers in experiments.
We propose a novel system for robot-to-human object handover that emulates human coworker interactions. Unlike most existing studies that focus primarily on grasping strategies and motion planning, our system focus on 1. inferring human handover intents, 2. imagining spatial handover configuration. The first one integrates multimodal perception-combining visual and verbal cues-to infer human intent. The second one using a diffusion-based model to generate the handover configuration, involving the spacial relationship among robot's gripper, the object, and the human hand, thereby mimicking the cognitive process of motor imagery. Experimental results demonstrate that our approach effectively interprets human cues and achieves fluent, human-like handovers, offering a promising solution for collaborative robotics. Code, videos, and data are available at: https://i3handover.github.io.