CVAug 24, 2019

Situational Fusion of Visual Representation for Visual Navigation

arXiv:1908.09073v270 citations
Originality Incremental advance
AI Analysis

This addresses the problem of robust visual navigation for agents in unseen environments, though it is incremental as it builds on existing fusion methods.

The paper tackles visual navigation by training an agent to fuse diverse visual representations based on situational understanding, resulting in significantly improved performance in novel environments compared to baselines.

A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities. For example, to "go to the nearest chair", the agent might need to identify a chair in a living room using semantics, follow along a hallway using vanishing point cues, and avoid obstacles using depth. Therefore, utilizing the appropriate visual perception abilities based on a situational understanding of the visual environment can empower these navigation models in unseen visual environments. We propose to train an agent to fuse a large set of visual representations that correspond to diverse visual perception abilities. To fully utilize each representation, we develop an action-level representation fusion scheme, which predicts an action candidate from each representation and adaptively consolidate these action candidates into the final action. Furthermore, we employ a data-driven inter-task affinity regularization to reduce redundancies and improve generalization. Our approach leads to a significantly improved performance in novel environments over ImageNet-pretrained baseline and other fusion methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes