CVCLHCLGNov 30, 2019

A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop

arXiv:1912.00124v2
Originality Incremental advance
AI Analysis

This addresses the problem of costly and static dataset creation for AI researchers, though it is incremental as it builds on existing VQA and VQG methods.

The paper tackles the challenge of acquiring large datasets for AI by proposing a system that generates visual questions, asks them to social media users, and collects responses to build a Visual Question Answering (VQA) dataset at low cost, with models that parse answers from noisy human responses significantly better than baselines.

Despite their importance in training artificial intelligence systems, large datasets remain challenging to acquire. For example, the ImageNet dataset required fourteen million labels of basic human knowledge, such as whether an image contains a chair. Unfortunately, this knowledge is so simple that it is tedious for human annotators but also tacit enough such that they are necessary. However, human collaborative efforts for tasks like labeling massive amounts of data are costly, inconsistent, and prone to failure, and this method does not resolve the issue of the resulting dataset being static in nature. What if we asked people questions they want to answer and collected their responses as data? This would mean we could gather data at a much lower cost, and expanding a dataset would simply become a matter of asking more questions. We focus on the task of Visual Question Answering (VQA) and propose a system that uses Visual Question Generation (VQG) to produce questions, asks them to social media users, and collects their responses. We present two models that can then parse clean answers from the noisy human responses significantly better than our baselines, with the goal of eventually incorporating the answers into a Visual Question Answering (VQA) dataset. By demonstrating how our system can collect large amounts of data at little to no cost, we envision similar systems being used to improve performance on other tasks in the future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes