FlashEvaluator: Expanding Search Space with Parallel Evaluation
This work addresses the problem of inefficient and inaccurate evaluation in Recommender Systems and NLP tasks for practitioners and researchers in these fields, offering an incremental yet impactful solution.
The authors tackled the limitations of traditional evaluators in the Generator-Evaluator framework, achieving improved accuracy and efficiency with their proposed FlashEvaluator, which has been deployed in a real-world recommender system with substantial revenue gains. The system's efficiency is improved with sublinear computational complexity.
The Generator-Evaluator (G-E) framework, i.e., evaluating K sequences from a generator and selecting the top-ranked one according to evaluator scores, is a foundational paradigm in tasks such as Recommender Systems (RecSys) and Natural Language Processing (NLP). Traditional evaluators process sequences independently, suffering from two major limitations: (1) lack of explicit cross-sequence comparison, leading to suboptimal accuracy; (2) poor parallelization with linear complexity of O(K), resulting in inefficient resource utilization and negative impact on both throughput and latency. To address these challenges, we propose FlashEvaluator, which enables cross-sequence token information sharing and processes all sequences in a single forward pass. This yields sublinear computational complexity that improves the system's efficiency and supports direct inter-sequence comparisons that improve selection accuracy. The paper also provides theoretical proofs and extensive experiments on recommendation and NLP tasks, demonstrating clear advantages over conventional methods. Notably, FlashEvaluator has been deployed in online recommender system of Kuaishou, delivering substantial and sustained revenue gains in practice.