CVIRJun 19, 2020

Compositional Learning of Image-Text Query for Image Retrieval

arXiv:2006.11149v3118 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for more accurate image retrieval in applications like e-commerce, where users want to find products with specific modifications, and it is incremental as it builds upon existing methods like TIRG.

The paper tackles the problem of retrieving images from a database using multi-modal queries that combine an image and text to specify modifications, such as changing the color of a dress, and proposes ComposeAE, an autoencoder-based model with deep metric learning and a rotational symmetry constraint, which outperforms the state-of-the-art TIRG method on MIT-States, Fashion200k, and Fashion IQ datasets.

In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications. For instance, a user of an E-Commerce platform is interested in buying a dress, which should look similar to her friend's dress, but the dress should be of white color with a ribbon sash. In this case, we would like the algorithm to retrieve some dresses with desired modifications in the query dress. We propose an autoencoder based model, ComposeAE, to learn the composition of image and text query for retrieving images. We adopt a deep metric learning approach and learn a metric that pushes composition of source image and text query closer to the target images. We also propose a rotational symmetry constraint on the optimization problem. Our approach is able to outperform the state-of-the-art method TIRG \cite{TIRG} on three benchmark datasets, namely: MIT-States, Fashion200k and Fashion IQ. In order to ensure fair comparison, we introduce strong baselines by enhancing TIRG method. To ensure reproducibility of the results, we publish our code here: \url{https://github.com/ecom-research/ComposeAE}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes