CVAILGMar 7, 2019

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

arXiv:1903.02741v1394 citations
Originality Incremental advance
AI Analysis

This work addresses the performance gap in higher-level vision problems for AI systems, focusing on abstract reasoning, but it is incremental as it builds on existing RPM-based datasets by adding structure representation.

The authors tackled the problem of high-level visual reasoning by introducing the RAVEN dataset, which links vision with structural, relational, and analogical reasoning using Raven's Progressive Matrices, and they showed consistent improvement across models by incorporating a neural module that combines visual understanding and structure reasoning.

Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and human intelligence in terms of higher-level vision problems, especially ones involving reasoning. Earlier attempts in equipping machines with high-level reasoning have hovered around Visual Question Answering (VQA), one typical task associating vision and language understanding. In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. Unlike previous works in measuring abstract reasoning using RPM, we establish a semantic link between vision and reasoning by providing structure representation. This addition enables a new type of abstract reasoning by jointly operating on the structure representation. Machine reasoning ability using modern computer vision is evaluated in this newly proposed dataset. Additionally, we also provide human performance as a reference. Finally, we show consistent improvement across all models by incorporating a simple neural module that combines visual understanding and structure reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes