ROAIOct 6, 2022

Embodied Referring Expression for Manipulation Question Answering in Interactive Environment

arXiv:2210.02709v18 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of active object manipulation for embodied AI agents in interactive environments, representing an incremental advancement by combining existing tasks.

The paper tackles the problem of enabling embodied agents to manipulate objects in interactive environments by introducing a new task called Remote Embodied Manipulation Question Answering (REMQA), which combines embodied referring expression with manipulation tasks, and presents a framework evaluated on a benchmark dataset in the AI2-THOR simulator.

Embodied agents are expected to perform more complicated tasks in an interactive environment, with the progress of Embodied AI in recent years. Existing embodied tasks including Embodied Referring Expression (ERE) and other QA-form tasks mainly focuses on interaction in term of linguistic instruction. Therefore, enabling the agent to manipulate objects in the environment for exploration actively has become a challenging problem for the community. To solve this problem, We introduce a new embodied task: Remote Embodied Manipulation Question Answering (REMQA) to combine ERE with manipulation tasks. In the REMQA task, the agent needs to navigate to a remote position and perform manipulation with the target object to answer the question. We build a benchmark dataset for the REMQA task in the AI2-THOR simulator. To this end, a framework with 3D semantic reconstruction and modular network paradigms is proposed. The evaluation of the proposed framework on the REMQA dataset is presented to validate its effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes