LGCLCVOct 15, 2021

Guiding Visual Question Generation

arXiv:2110.08226v3629 citations
Originality Incremental advance
AI Analysis

This addresses the issue of arbitrary question generation in VQG for AI systems, though it is incremental as it builds on existing VQG methods.

The paper tackles the problem of Visual Question Generation (VQG) where models struggle due to multiple valid questions per image, by introducing guided variants that condition on categorical information, resulting in a substantial improvement of over 9 BLEU-4 over the state of the art.

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -- multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition on, based on discrete latent variables. The proposed models are evaluated on an answer-category augmented VQA dataset and our quantitative results show a substantial improvement over the current state of the art (over 9 BLEU-4 increase). Human evaluation validates that guidance helps the generation of questions that are grammatically coherent and relevant to the given image and objects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes