IRLGMay 31

Semantic Retrieval for Product Search in E-Commerce

arXiv:2606.0150462.6
AI Analysis

For e-commerce platforms, this provides a practical solution to improve product search relevance, handling fine-grained attribute distinctions and query noise.

The paper tackles semantic product search in e-commerce with short, noisy queries. The proposed Siamese LLM dual-encoder with two-stage training (contrastive learning with false-negative margin mask and ROAR preference optimization) achieves accurate retrieval of exact matches and correct ordering of substitutes, with gains confirmed across query-frequency strata and business verticals, validated via live A/B deployment.

Semantic retrieval in e-commerce must handle short, noisy, and colloquial queries over large product catalogs with fine-grained attribute distinctions. We present a Siamese LLM dual-encoder trained through a two-stage pipeline: contrastive learning with a false-negative margin mask to prevent penalization of near-duplicate products, followed by Relative Odds Alignment for Retrieval (ROAR), a preference optimization objective that extends Bradley-Terry to variable-sized graded relevance groups via consecutive odds-ratio margins. The training corpus mirrors this progression - substitute query-product pairs provide coarse semantic supervision in Stage 1 and graded relevance annotations drive fine-grained ranking in Stage 2. The resulting system accurately retrieves exact matches while correctly ordering substitutes and complementary products, with gains confirmed across query-frequency strata and business verticals, and statistical significance validated through live A/B deployment at scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes