ROAICLCVDec 21, 2023

Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction

arXiv:2312.13655v1h-index: 12
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of open-world human-robot interaction where objects and attributes are unbounded, though it appears incremental as it builds on existing compositional zero-shot learning approaches.

The paper tackles the problem of enabling robots to identify objects from natural language commands without prior visual observations, using an attribute-based compositional zero-shot learning method, and shows preliminary results of correct object identification on datasets like MIT-States and Clothing 16K.

Language-enabled robots have been widely studied over the past years to enable natural human-robot interaction and teaming in various real-world applications. Language-enabled robots must be able to comprehend referring expressions to identify a particular object from visual perception using a set of referring attributes extracted from natural language. However, visual observations of an object may not be available when it is referred to, and the number of objects and attributes may also be unbounded in open worlds. To address the challenges, we implement an attribute-based compositional zero-shot learning method that uses a list of attributes to perform referring expression comprehension in open worlds. We evaluate the approach on two datasets including the MIT-States and the Clothing 16K. The preliminary experimental results show that our implemented approach allows a robot to correctly identify the objects referred to by human commands.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes