IRMay 6

Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation

Ruijun Chen, Chongming Gao, Jiawei Chen, Weiqin Yang, Xiangnan He

arXiv:2605.0455991.2Has Code

AI Analysis

For practitioners of LLM-based recommender systems, BLADE provides a method to optimize non-differentiable list-wise metrics (e.g., NDCG, fairness) more effectively and efficiently than existing approaches.

BLADE introduces a Bayesian list-wise alignment framework for LLM-based recommendation that dynamically updates the target distribution during training, overcoming limitations of static Best-of-N approaches. It achieves significant gains over state-of-the-art baselines on three real-world datasets, breaking the static performance upper bound in ranking accuracy and list-wise metrics.

Large Language Models have revolutionized recommender systems (LLM4Rec) by leveraging their generative capabilities to model complex user preferences. However, existing LLM4Rec methods primarily rely on token-level objectives, making it difficult to optimize list-level and non-differentiable metrics (e.g., NDCG, fairness) that define actual recommendation quality. While Best-of-N (BoN) directly optimizes these metrics during inference, its high computational cost hinders real-world deployment. To address this, BoN Alignment aims to distill the search capability into the model itself, yet current approaches suffer from two critical limitations: (1) Indiscriminate Supervision, where the static reference fails to distinguish the relative quality of candidates exceeding its empirical range, leading to a loss of ranking guidance; and (2) Gradient Decay, where the effective supervision signal rapidly diminishes as the evolving policy improves, resulting in inefficient optimization. To overcome these challenges, we propose BLADE (Bayesian List-wise Alignment via Dynamic Estimation). Unlike static approaches, BLADE introduces a Bayesian framework that continuously updates the target distribution by fusing historical priors with dynamic evidence from the model's current rollouts. This mechanism constructs a self-evolving target that adapts to the model's growing capabilities, ensuring the training signal remains informative throughout the learning process. Extensive experiments on three real-world datasets demonstrate that BLADE significantly outperforms state-of-the-art baselines. Crucially, it breaks the static performance upper bound, achieving sustained gains in both ranking accuracy (Recall, NDCG) and complex list-wise metrics (Fairness, Diversity). The code is available via https://github.com/RegionCh/BLADE.

View on arXiv PDF Code

Similar