Asking More Informative Questions for Grounded Retrieval
This work addresses the challenge of interactive information gathering for AI systems in visual domains, offering a novel approach to improve efficiency and accuracy, though it is incremental in advancing question-asking methods.
The paper tackled the problem of limited information gain in grounded multi-turn image identification tasks by enabling models to ask open-ended questions instead of yes/no ones, resulting in a 14% accuracy increase over the previous state-of-the-art and 48% more efficient games in human evaluations.
When a model is trying to gather information in an interactive setting, it benefits from asking informative questions. However, in the case of a grounded multi-turn image identification task, previous studies have been constrained to polar yes/no questions, limiting how much information the model can gain in a single turn. We present an approach that formulates more informative, open-ended questions. In doing so, we discover that off-the-shelf visual question answering (VQA) models often make presupposition errors, which standard information gain question selection methods fail to account for. To address this issue, we propose a method that can incorporate presupposition handling into both question selection and belief updates. Specifically, we use a two-stage process, where the model first filters out images which are irrelevant to a given question, then updates its beliefs about which image the user intends. Through self-play and human evaluations, we show that our method is successful in asking informative open-ended questions, increasing accuracy over the past state-of-the-art by 14%, while resulting in 48% more efficient games in human evaluations.