CVAICLOct 10, 2022

Transformer-based Localization from Embodied Dialog with Large-scale Pre-training

arXiv:2210.04864v1297 citationsh-index: 82
Originality Incremental advance
AI Analysis

This addresses the problem of agent localization in unknown environments for embodied AI, representing an incremental improvement over prior methods.

The paper tackles the task of Localization via Embodied Dialog (LED) by developing a novel LED-Bert architecture with a graph-based scene representation, which outperforms previous baselines.

We address the challenging task of Localization via Embodied Dialog (LED). Given a dialog from two agents, an Observer navigating through an unknown environment and a Locator who is attempting to identify the Observer's location, the goal is to predict the Observer's final location in a map. We develop a novel LED-Bert architecture and present an effective pretraining strategy. We show that a graph-based scene representation is more effective than the top-down 2D maps used in prior works. Our approach outperforms previous baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes