CVAICLLGNov 30, 2017

Embodied Question Answering

arXiv:1711.11543v2765 citations
Originality Highly original
AI Analysis

This work addresses the problem of integrating active perception, language understanding, and goal-driven navigation for AI agents, which is significant for researchers working on embodied AI and robotics.

This paper introduces Embodied Question Answering (EmbodiedQA), a new AI task where an agent navigates a 3D environment using first-person vision to answer questions about objects within it. The authors developed the necessary environments, end-to-end reinforcement learning agents, and evaluation protocols for this task.

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). This challenging task requires a range of AI skills -- active perception, language understanding, goal-driven navigation, commonsense reasoning, and grounding of language into actions. In this work, we develop the environments, end-to-end-trained reinforcement learning agents, and evaluation protocols for EmbodiedQA.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes