MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding
This work addresses the problem of optimizing bidding strategies for advertisers and ad platforms by handling multiple goals simultaneously, representing an incremental advancement in applying reinforcement learning to real-time bidding.
The paper tackles the challenge of balancing multiple objectives in real-time bidding for online advertising by proposing MoTiAC, a multi-objective reinforcement learning algorithm, which achieves improved performance on a large-scale commercial dataset compared to recent approaches.
Online Real-Time Bidding (RTB) is a complex auction game among which advertisers struggle to bid for ad impressions when a user request occurs. Considering display cost, Return on Investment (ROI), and other influential Key Performance Indicators (KPIs), large ad platforms try to balance the trade-off among various goals in dynamics. To address the challenge, we propose a Multi-ObjecTive Actor-Critics algorithm based on reinforcement learning (RL), named MoTiAC, for the problem of bidding optimization with various goals. In MoTiAC, objective-specific agents update the global network asynchronously with different goals and perspectives, leading to a robust bidding policy. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments. In addition, we mathematically prove that our model will converge to Pareto optimality. Finally, experiments on a large-scale real-world commercial dataset from Tencent verify the effectiveness of MoTiAC versus a set of recent approaches