AICLCVHCAug 29, 2016

Visual Question: Predicting If a Crowd Will Agree on the Answer

arXiv:1608.08188v11 citations
Originality Incremental advance
AI Analysis

This addresses efficiency in crowdsourcing for VQA, offering a practical improvement for researchers and developers, though it is incremental as it builds on existing systems.

The paper tackles the problem of inconsistent human agreement on answers in visual question answering (VQA) by training a model to predict crowd agreement, and uses this to optimize crowdsourcing by reducing human effort by at least 20% without losing information.

Visual question answering (VQA) systems are emerging from a desire to empower users to ask any natural language question about visual content and receive a valid answer in response. However, close examination of the VQA problem reveals an unavoidable, entangled problem that multiple humans may or may not always agree on a single answer to a visual question. We train a model to automatically predict from a visual question whether a crowd would agree on a single answer. We then propose how to exploit this system in a novel application to efficiently allocate human effort to collect answers to visual questions. Specifically, we propose a crowdsourcing system that automatically solicits fewer human responses when answer agreement is expected and more human responses when answer disagreement is expected. Our system improves upon existing crowdsourcing systems, typically eliminating at least 20% of human effort with no loss to the information collected from the crowd.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes