CVCLMar 27, 2019

Image search using multilingual texts: a cross-modal learning approach between image and text

arXiv:1903.11299v313 citations
Originality Incremental advance
AI Analysis

This addresses the problem of cross-lingual image retrieval for users needing to search images with text in multiple languages, but it appears incremental as it builds on existing multilingual embedding techniques.

The paper tackled the problem of searching images using multilingual text queries by embedding images and texts into a shared vector space, achieving results such as improved cross-modal retrieval on datasets like COCO and Multi30K, though specific numbers are not provided in the abstract.

Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7].

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes