IRApr 28

RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce

Zhiguo Chen, Guohao Sun, Yiming Qiu, Xingzhi Yao, Mingming Li, Huimu Wang, Yangqi Zhang, Songlin Wang, Sulong Xu

arXiv:2602.2396461.9h-index: 4

AI Analysis

For e-commerce search systems using generative retrieval, RAD-DPO provides a robust alignment method that handles structured outputs and noisy feedback, enabling more accurate and efficient retrieval in industrial deployments.

RAD-DPO addresses three limitations of applying Direct Preference Optimization to generative retrieval with structured Semantic IDs: prefix gradient conflicts, noisy pseudo-negatives, and probability squeezing in multi-label queries. It achieves significant improvements in retrieval precision and training efficiency, validated through offline evaluations and large-scale online A/B testing on JD.com's core search engine.

Generative Retrieval (GR) is rapidly transforming e-commerce search by replacing traditional multi-stage pipelines with the autoregressive decoding of structured Semantic IDs (SIDs). Despite this architectural efficiency, aligning GR models with nuanced, real-world user preferences remains a critical challenge. While Direct Preference Optimization (DPO) offers an efficient alignment solution, its direct application to structured SIDs suffers from three limitations: (i) it penalizes shared hierarchical prefixes, causing gradient conflicts; (ii) it is vulnerable to noisy pseudo-negatives from implicit feedback; and (iii) in multi-label queries with multiple relevant items, it exacerbates a probability "squeezing effect" among valid candidates. To address these issues, we propose RAD-DPO, which introduces token-level gradient detachment to protect prefix structures, similarity-based dynamic reward weighting to mitigate label noise, and a multi-label global contrastive objective integrated with global SFT loss to explicitly expand positive coverage. Extensive offline evaluations and large-scale online A/B testing on JD.com's core search engine demonstrate that RAD-DPO achieves significant improvements in both retrieval precision and training efficiency, proving its robustness for massive industrial deployments.

View on arXiv PDF

Similar