IRApr 28

RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce

arXiv:2602.2396461.9h-index: 4
AI Analysis

For e-commerce search systems using generative retrieval, RAD-DPO provides a robust alignment method that handles structured outputs and noisy feedback, enabling more accurate and efficient retrieval in industrial deployments.

RAD-DPO addresses three limitations of applying Direct Preference Optimization to generative retrieval with structured Semantic IDs: prefix gradient conflicts, noisy pseudo-negatives, and probability squeezing in multi-label queries. It achieves significant improvements in retrieval precision and training efficiency, validated through offline evaluations and large-scale online A/B testing on JD.com's core search engine.

Generative Retrieval (GR) is rapidly transforming e-commerce search by replacing traditional multi-stage pipelines with the autoregressive decoding of structured Semantic IDs (SIDs). Despite this architectural efficiency, aligning GR models with nuanced, real-world user preferences remains a critical challenge. While Direct Preference Optimization (DPO) offers an efficient alignment solution, its direct application to structured SIDs suffers from three limitations: (i) it penalizes shared hierarchical prefixes, causing gradient conflicts; (ii) it is vulnerable to noisy pseudo-negatives from implicit feedback; and (iii) in multi-label queries with multiple relevant items, it exacerbates a probability "squeezing effect" among valid candidates. To address these issues, we propose RAD-DPO, which introduces token-level gradient detachment to protect prefix structures, similarity-based dynamic reward weighting to mitigate label noise, and a multi-label global contrastive objective integrated with global SFT loss to explicitly expand positive coverage. Extensive offline evaluations and large-scale online A/B testing on JD.com's core search engine demonstrate that RAD-DPO achieves significant improvements in both retrieval precision and training efficiency, proving its robustness for massive industrial deployments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes