CVCLMar 14, 2023

Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening

arXiv:2303.07740v14 citationsh-index: 77
Originality Incremental advance
AI Analysis

This addresses efficiency bottlenecks in image-text retrieval for practical applications, representing an incremental improvement.

The paper tackles the high time complexity of image-text retrieval methods by introducing a keyword-guided pre-screening framework that uses keyword matching to filter irrelevant samples, achieving O(1) querying time complexity while maintaining performance on Flickr30K and MS-COCO datasets.

Under the flourishing development in performance, current image-text retrieval methods suffer from $N$-related time complexity, which hinders their application in practice. Targeting at efficiency improvement, this paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval. Specifically, we convert the image and text data into the keywords and perform the keyword matching across modalities to exclude a large number of irrelevant gallery samples prior to the retrieval network. For the keyword prediction, we transfer it into a multi-label classification problem and propose a multi-task learning scheme by appending the multi-label classifiers to the image-text retrieval network to achieve a lightweight and high-performance keyword prediction. For the keyword matching, we introduce the inverted index in the search engine and create a win-win situation on both time and space complexities for the pre-screening. Extensive experiments on two widely-used datasets, i.e., Flickr30K and MS-COCO, verify the effectiveness of the proposed framework. The proposed framework equipped with only two embedding layers achieves $O(1)$ querying time complexity, while improving the retrieval efficiency and keeping its performance, when applied prior to the common image-text retrieval methods. Our code will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes