AI CL CV LGAug 31, 2016

Measuring Machine Intelligence Through Visual Question Answering

C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh

arXiv:1608.08716v117.438 citations

Originality Synthesis-oriented

AI Analysis

This addresses the need for better benchmarks in AI evaluation, though it is incremental as it builds on existing tasks like image captioning.

The paper tackles the problem of measuring machine intelligence by proposing Visual Question Answering as a more effective task than image captioning, and introduces a large dataset with over 760,000 human-generated questions and around 10 million answers for evaluation.

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.

View on arXiv PDF

Similar