LG CL CV NE MLApr 19, 2019

Challenges and Prospects in Vision and Language Research

Kushal Kafle, Robik Shrestha, Christopher Kanan

arXiv:1904.09317v215.843 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of misleading progress in AI evaluation for researchers and developers, but it is incremental as it builds on existing critiques.

The paper reviews how current vision-language systems achieve high performance due to dataset and evaluation flaws rather than genuine intelligence, and proposes a path forward for more robust benchmarks.

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.

View on arXiv PDF

Similar