CVNov 29, 2022

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

arXiv:2211.15876v168 citationsh-index: 85
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of standardization and practical limitations in image-goal navigation for embodied AI agents, though it is incremental as it builds on existing tasks and datasets.

The authors tackled the problem of embodied visual navigation with image goals by introducing the Instance-specific ImageNav task, which addresses ambiguities and rigidities in existing formulations, resulting in a standardized benchmark released in the Habitat Simulator using HM3D scenes.

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image. Unlike related navigation tasks, ImageNav does not have a standardized task definition which makes comparison across methods difficult. Further, existing formulations have two problematic properties; (1) image-goals are sampled from random locations which can lead to ambiguity (e.g., looking at walls), and (2) image-goals match the camera specification and embodiment of the agent; this rigidity is limiting when considering user-driven downstream applications. We present the Instance-specific ImageNav task (InstanceImageNav) to address these limitations. Specifically, the goal image is 'focused' on some particular object instance in the scene and is taken with camera parameters independent of the agent. We instantiate InstanceImageNav in the Habitat Simulator using scenes from the Habitat-Matterport3D dataset (HM3D) and release a standardized benchmark to measure community progress.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes