IRLGJan 13, 2025

Multimodal semantic retrieval for product search

arXiv:2501.07365v35 citationsh-index: 1WWW
Originality Incremental advance
AI Analysis

This addresses the need for better product search in e-commerce by leveraging multimodal data, though it appears incremental as it builds on existing semantic retrieval techniques.

The paper tackled the problem of incorporating product images into semantic retrieval for e-commerce search, demonstrating that multimodal representations improve purchase recall or relevance accuracy compared to text-only methods.

Semantic retrieval (also known as dense retrieval) based on textual data has been extensively studied for both web search and product search application fields, where the relevance of a query and a potential target document is computed by their dense vector representation comparison. Product image is crucial for e-commerce search interactions and is a key factor for customers at product explorations. However, its impact on semantic retrieval has not been well studied yet. In this research, we build a multimodal representation for product items in e-commerce search in contrast to pure-text representation of products, and investigate the impact of such representations. The models are developed and evaluated on e-commerce datasets. We demonstrate that a multimodal representation scheme for a product can show improvement either on purchase recall or relevance accuracy in semantic retrieval. Additionally, we provide numerical analysis for exclusive matches retrieved by a multimodal semantic retrieval model versus a text-only semantic retrieval model, to demonstrate the validation of multimodal solutions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes