Know2Look: Commonsense Knowledge for Visual Search
This addresses the challenge of imperfect visual detection in image-text retrieval for social media and web applications, though it appears incremental as it builds on existing multimodal methods.
The paper tackled the problem of improving search and retrieval for documents with images by incorporating commonsense knowledge about query terms, resulting in a multimodal approach that combines text, visual cues, and commonsense knowledge to enhance efficiency.
With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval.