LGOct 9, 2025

GRADE: Personalized Multi-Task Fusion via Group-relative Reinforcement Learning with Adaptive Dirichlet Exploration

Tingfeng Hong, Pingye Ren, Xinlong Xiao, Chao Wang, Chenyi Lei, Wenwu Ou, Han Li

arXiv:2510.07919v24.1h-index: 4

Originality Highly original

AI Analysis

This work addresses the challenge of personalizing multi-task fusion for user satisfaction in large-scale recommender systems, representing a novel method rather than an incremental improvement.

The paper tackles the problem of balancing multiple objectives in recommender and search systems by proposing GRADE, a personalized multi-task fusion framework that uses group-relative reinforcement learning with adaptive Dirichlet exploration, resulting in significant gains such as +0.595% CTR, +1.193% CVR, +1.788% OPM, and +1.568% total order volume in large-scale A/B tests.

Balancing multiple objectives is critical for user satisfaction in modern recommender and search systems, yet current Multi-Task Fusion (MTF) methods rely on static, manually-tuned weights that fail to capture individual user intent. While Reinforcement Learning (RL) offers a path to personalization, traditional approaches often falter due to training instability and the sparse rewards inherent in these large-scale systems. To address these limitations, we propose Group-relative Reinforcement learning with Adaptive Dirichlet Exploration (GRADE), a novel and robust framework for personalized multi-task fusion. GRADE leverages a critic-free, Group Relative Policy Optimization (GRPO) paradigm, enabling stable and efficient policy learning by evaluating the relative performance of candidate weight groups. Its core innovations include employing the Dirichlet distribution for principled and structured exploration of the weight space, and a composite reward function that combines sparse user feedback with dense model priors and rule-based constraints to guide the search effectively. Deployed in the in-app marketplace of an application with over hundreds of millions daily active users, GRADE significantly outperforms established baselines, achieving substantial gains in rigorous large-scale A/B tests: +0.595\% in CTR, +1.193\% in CVR, +1.788\% in OPM, and +1.568\% in total order volume. Following its strong performance, GRADE has been fully deployed in the marketplace search scenario of Kuaishou, serving hundreds of millions of users.

View on arXiv PDF

Similar