CVMay 16, 2021

Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval

arXiv:2105.07391v26 citationsHas Code
Originality Synthesis-oriented
AI Analysis

It is a survey paper, so it is incremental, summarizing existing work for researchers in the field.

This paper surveys visual-semantic embedding methods for zero-shot image retrieval, focusing on technological trends, datasets, and evaluation results to provide a comprehensive overview and encourage further research.

Visual-semantic embedding is an interesting research topic because it is useful for various tasks, such as visual question answering (VQA), image-text retrieval, image captioning, and scene graph generation. In this paper, we focus on zero-shot image retrieval using sentences as queries and present a survey of the technological trends in this area. First, we provide a comprehensive overview of the history of the technology, starting with a discussion of the early studies of image-to-text matching and how the technology has evolved over time. In addition, a description of the datasets commonly used in experiments and a comparison of the evaluation results of each method are presented. We also introduce the implementation available on github for use in confirming the accuracy of experiments and for further improvement. We hope that this survey paper will encourage researchers to further develop their research on bridging images and languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes