AICVIRMay 24, 2017

How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

arXiv:1705.08844v11 citations
Originality Incremental advance
AI Analysis

This work addresses visual reasoning tasks for AI systems by integrating rule-based knowledge sources, though it is incremental as it builds on existing deep learning methods with a filtering step.

The paper tackled the problem of sentence-based image retrieval by incorporating a general-purpose commonsense ontology (ConceptNet) into state-of-the-art vision systems, showing that it improves performance on a common benchmark dataset when filtered for visually relevant relations.

The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies---specifically, MIT's ConceptNet ontology---can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes