IRSep 2, 2019

Know2Look: Commonsense Knowledge for Visual Search

arXiv:1909.00749v11090 citations
AI Analysis

This addresses the challenge of imperfect visual detection in image-text retrieval for social media and web applications, though it appears incremental as it builds on existing multimodal methods.

The paper tackled the problem of improving search and retrieval for documents with images by incorporating commonsense knowledge about query terms, resulting in a multimodal approach that combines text, visual cues, and commonsense knowledge to enhance efficiency.

With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes