IRLGApr 8

Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

arXiv:2604.074201.9h-index: 3
Predicted impact top 94% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses a critical problem for large-scale platforms like Kuaishou, enhancing user experience in video search through an incremental improvement that fuses causality and utility for efficient reranking.

The paper tackled the dual dilemma in deploying generative reranking for industrial-scale video search, where autoregressive models are too slow and non-autoregressive models lack dependency capturing, and optimization methods like supervised learning or reinforcement learning face challenges; the proposed Dual-Rerank framework achieved state-of-the-art performance, significantly improving user satisfaction and watch time while reducing inference latency.

Kuaishou serves over 400 million daily active users, processing hundreds of millions of search queries daily against a repository of tens of billions of short videos. As the final decision layer, the reranking stage determines user experience by optimizing whole-page utility. While traditional score-and-sort methods fail to capture combinatorial dependencies, Generative Reranking offers a superior paradigm by directly modeling the permutation probability. However, deploying Generative Reranking in such a high-stakes environment faces a fundamental dual dilemma: 1) the structural trade-off where Autoregressive (AR) models offer superior Sequential modeling but suffer from prohibitive latency, versus Non-Autoregressive (NAR) models that enable efficiency but lack dependency capturing; 2) the optimization gap where Supervised Learning faces challenges in directly optimizing whole-page utility, while Reinforcement Learning (RL) struggles with instability in high-throughput data streams. To resolve this, we propose Dual-Rerank, a unified framework designed for industrial reranking that bridges the structural gap via Sequential Knowledge Distillation and addresses the optimization gap using List-wise Decoupled Reranking Optimization (LDRO) for stable online RL. Extensive A/B testing on production traffic demonstrates that Dual-Rerank achieves State-of-the-Art performance, significantly improving User satisfaction and Watch Time while drastically reducing inference latency compared to AR baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes