Hard to Cheat: A Turing Test based on Answering Questions about Images
This work addresses the challenge of evaluating holistic AI intelligence in a way that is less prone to over-interpretation, which is significant for the AI research community.
The paper proposes question answering about images as a robust version of the Turing Test for AI, contrasting it with tasks like grounding and description generation, and discusses tools to measure progress in this field.
Progress in language and image understanding by machines has sparkled the interest of the research community in more open-ended, holistic tasks, and refueled an old AI dream of building intelligent machines. We discuss a few prominent challenges that characterize such holistic tasks and argue for "question answering about images" as a particular appealing instance of such a holistic task. In particular, we point out that it is a version of a Turing Test that is likely to be more robust to over-interpretations and contrast it with tasks like grounding and generation of descriptions. Finally, we discuss tools to measure progress in this field.