ROJul 9, 2021

Using Depth for Improving Referring Expression Comprehension in Real-World Environments

arXiv:2107.04658v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate object identification in human-robot collaboration, though it is incremental as it builds on existing RGB-based methods by adding depth features.

The paper tackles the problem of improving referring expression comprehension in real-world environments by incorporating depth information, resulting in significant performance improvements, particularly in scenes where depth is critical for disambiguating objects.

In a human-robot collaborative task where a robot helps its partner by finding described objects, the depth dimension plays a critical role in successful task completion. Existing studies have mostly focused on comprehending the object descriptions using RGB images. However, 3-dimensional space perception that includes depth information is fundamental in real-world environments. In this work, we propose a method to identify the described objects considering depth dimension data. Using depth features significantly improves performance in scenes where depth data is critical to disambiguate the objects and across our whole evaluation dataset that contains objects that can be specified with and without the depth dimension.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes