CVMar 11, 2024

DiaLoc: An Iterative Approach to Embodied Dialog Localization

arXiv:2403.06846v15 citationsh-index: 21CVPR
Originality Incremental advance
AI Analysis

This addresses the gap in practical dialog-based localization for real-world applications, moving beyond navigation-focused embodied dialog research.

The paper tackles the understudied problem of embodied dialog localization by proposing DiaLoc, an iterative framework that refines location predictions after each dialog turn, achieving state-of-the-art results with improvements of +7.08% in single-shot and +10.85% in multi-shot settings on Acc5@valUnseen.

Multimodal learning has advanced the performance for many vision-language tasks. However, most existing works in embodied dialog research focus on navigation and leave the localization task understudied. The few existing dialog-based localization approaches assume the availability of entire dialog prior to localizaiton, which is impractical for deployed dialog-based localization. In this paper, we propose DiaLoc, a new dialog-based localization framework which aligns with a real human operator behavior. Specifically, we produce an iterative refinement of location predictions which can visualize current pose believes after each dialog turn. DiaLoc effectively utilizes the multimodal data for multi-shot localization, where a fusion encoder fuses vision and dialog information iteratively. We achieve state-of-the-art results on embodied dialog-based localization task, in single-shot (+7.08% in Acc5@valUnseen) and multi-shot settings (+10.85% in Acc5@valUnseen). DiaLoc narrows the gap between simulation and real-world applications, opening doors for future research on collaborative localization and navigation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes