CLCVLGNov 14, 2022

Multi-VQG: Generating Engaging Questions for Multiple Images

arXiv:2211.07441v2295 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the limitation of traditional visual question generation in handling time-series data for applications like creativity and experience sharing, though it is incremental in extending VQG to multiple images.

The paper tackles the problem of generating engaging questions from multiple images, rather than single images, to better capture time-series information and promote awareness. They introduced the MVQG dataset and showed that models building stories from image sequences can generate engaging questions, confirming that people construct mental event pictures before asking.

Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models to generate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes