CVIRMMFeb 18, 2021

Hierarchical Similarity Learning for Language-based Product Image Retrieval

arXiv:2102.09375v17 citationsHas Code
AI Analysis

This work addresses product retrieval for e-commerce users by improving cross-modal matching, but it appears incremental as it builds on existing methods with a focus on granularity.

The paper tackles language-based product image retrieval by proposing a Hierarchical Similarity Learning (HSL) network that computes cross-modal similarities at multiple granularities, and experiments on a large-scale dataset demonstrate its effectiveness.

This paper aims for the language-based product image retrieval task. The majority of previous works have made significant progress by designing network structure, similarity measurement, and loss function. However, they typically perform vision-text matching at certain granularity regardless of the intrinsic multiple granularities of images. In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network. HSL first learns multi-level representations of input data by stacked encoders, and object-granularity similarity and image-granularity similarity are computed at each level. All the similarities are combined as the final hierarchical cross-modal similarity. Experiments on a large-scale product retrieval dataset demonstrate the effectiveness of our proposed method. Code and data are available at https://github.com/liufh1/hsl.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes