CLCVOct 24, 2022

Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue?

arXiv:2210.12997v1297 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work identifies limitations in current decoding strategies for goal-oriented multimodal systems, offering a foundation for improved algorithms in Visual Dialogue tasks.

The study evaluated various decoding strategies in a Visual Dialogue referential game to address challenges like grounding and informativeness, finding that none balanced lexical richness, task accuracy, and visual grounding effectively, but provided insights into their strengths and weaknesses.

Decoding strategies play a crucial role in natural language generation systems. They are usually designed and evaluated in open-ended text-only tasks, and it is not clear how different strategies handle the numerous challenges that goal-oriented multimodal systems face (such as grounding and informativeness). To answer this question, we compare a wide variety of different decoding strategies and hyper-parameter configurations in a Visual Dialogue referential game. Although none of them successfully balance lexical richness, accuracy in the task, and visual grounding, our in-depth analysis allows us to highlight the strengths and weaknesses of each decoding strategy. We believe our findings and suggestions may serve as a starting point for designing more effective decoding algorithms that handle the challenges of Visual Dialogue tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes