ROAICVOct 26, 2024

EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering

arXiv:2410.20263v212 citationsh-index: 25Has CodeIROS
Originality Highly original
AI Analysis

This addresses the challenge for robot assistants to explore efficiently and provide accurate answers in open-vocabulary settings, representing a strong specific gain rather than an incremental improvement.

The paper tackles the problem of open-vocabulary embodied question answering by introducing EfficientEQA, a framework that combines efficient exploration with free-form answer generation, achieving over 15% higher answer accuracy and over 20% fewer exploration steps than state-of-the-art methods.

Embodied Question Answering (EQA) is an essential yet challenging task for robot assistants. Large vision-language models (VLMs) have shown promise for EQA, but existing approaches either treat it as static video question answering without active exploration or restrict answers to a closed set of choices. These limitations hinder real-world applicability, where a robot must explore efficiently and provide accurate answers in open-vocabulary settings. To overcome these challenges, we introduce EfficientEQA, a novel framework that couples efficient exploration with free-form answer generation. EfficientEQA features three key innovations: (1) Semantic-Value-Weighted Frontier Exploration (SFE) with Verbalized Confidence (VC) from a black-box VLM to prioritize semantically important areas to explore, enabling the agent to gather relevant information faster; (2) a BLIP relevancy-based mechanism to stop adaptively by flagging highly relevant observations as outliers to indicate whether the agent has collected enough information; and (3) a Retrieval-Augmented Generation (RAG) method for the VLM to answer accurately based on pertinent images from the agent's observation history without relying on predefined choices. Our experimental results show that EfficientEQA achieves over 15% higher answer accuracy and requires over 20% fewer exploration steps than state-of-the-art methods. Our code is available at: https://github.com/chengkaiAcademyCity/EfficientEQA

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes