CV AI CL LGApr 6, 2019

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

arXiv:1904.03461v127.2202 citationsh-index: 68

Originality Incremental advance

AI Analysis

This work addresses the problem of bridging vision and embodied perception for AI agents, though it is incremental as it builds on existing tasks and baselines.

The paper tackles the challenge of Embodied Question Answering in photorealistic environments by studying navigation policies using 3D point clouds, RGB images, or their combination, finding that point clouds outperform RGB images for obstacle avoidance and introducing a novel loss-weighting scheme called Inflection Weighting to improve recurrent models.

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D). We thoroughly study navigation policies that utilize 3D point clouds, RGB images, or their combination. Our analysis of these models reveals several key findings. We find that two seemingly naive navigation baselines, forward-only and random, are strong navigators and challenging to outperform, due to the specific choice of the evaluation setting presented by [1]. We find a novel loss-weighting scheme we call Inflection Weighting to be important when training recurrent models for navigation with behavior cloning and are able to out perform the baselines with this technique. We find that point clouds provide a richer signal than RGB images for learning obstacle avoidance, motivating the use (and continued study) of 3D deep learning models for embodied navigation.

View on arXiv PDF

Similar