ROAIHCSep 30, 2024

Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant

arXiv:2409.20548v28 citationsh-index: 10
AI Analysis

This work addresses the challenge of remote human-robot interaction for household assistance, representing an incremental step towards practical robot assistants.

The authors tackled the problem of enabling remote multimodal interaction with household robots by developing Robi Butler, which uses LLMs and vision-language models to interpret and execute complex user commands in real-world home environments in a zero-shot manner, demonstrating its ability through evaluations on various household tasks and a user study.

Imagine a future when we can Zoom-call a robot to manage household chores remotely. This work takes one step in this direction. Robi Butler is a new household robot assistant that enables seamless multimodal remote interaction. It allows the human user to monitor its environment from a first-person view, issue voice or text commands, and specify target objects through hand-pointing gestures. At its core, a high-level behavior module, powered by Large Language Models (LLMs), interprets multimodal instructions to generate multistep action plans. Each plan consists of open-vocabulary primitives supported by vision-language models, enabling the robot to process both textual and gestural inputs. Zoom provides a convenient interface to implement remote interactions between the human and the robot. The integration of these components allows Robi Butler to ground remote multimodal instructions in real-world home environments in a zero-shot manner. We evaluated the system on various household tasks, demonstrating its ability to execute complex user commands with multimodal inputs. We also conducted a user study to examine how multimodal interaction influences user experiences in remote human-robot interaction. These results suggest that with the advances in robot foundation models, we are moving closer to the reality of remote household robot assistants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes