AICLCVNov 3, 2019

Scene Graph based Image Retrieval -- A case study on the CLEVR Dataset

arXiv:1911.00850v115 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of incorporating pragmatic strategies in large-scale image retrieval for computer vision applications, though it is incremental as it builds on existing neural methods.

The paper tackles the problem of text-based image retrieval by proposing a neural-symbolic approach that uses scene graphs and graph matching, achieving a retrieval accuracy of 85% on the CLEVR dataset.

With the prolification of multimodal interaction in various domains, recently there has been much interest in text based image retrieval in the computer vision community. However most of the state of the art techniques model this problem in a purely neural way, which makes it difficult to incorporate pragmatic strategies in searching a large scale catalog especially when the search requirements are insufficient and the model needs to resort to an interactive retrieval process through multiple iterations of question-answering. Motivated by this, we propose a neural-symbolic approach for a one-shot retrieval of images from a large scale catalog, given the caption description. To facilitate this, we represent the catalog and caption as scene-graphs and model the retrieval task as a learnable graph matching problem, trained end-to-end with a REINFORCE algorithm. Further, we briefly describe an extension of this pipeline to an iterative retrieval framework, based on interactive questioning and answering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes