IRLGJun 29, 2025

Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

arXiv:2506.23090v23 citationsh-index: 21KDD
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing online advertising for recommender systems, offering a domain-specific solution that is incremental in its approach.

The paper tackles the challenges of applying offline reinforcement learning to sparse advertising scenarios in recommender systems, such as overestimation and distributional shifts, by proposing MTORL, a multi-task offline RL model that improves channel recommendation and budget allocation, achieving superior performance in offline and online experiments.

Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL's superiority over state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes