Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
This work addresses a specific limitation in auto-bidding for advertisers, offering an incremental improvement over existing AIGB methods.
The paper tackled the performance bottleneck in AI-Generated Bidding (AIGB) methods for auto-bidding, which struggle to explore beyond static offline datasets, and proposed AIGB-Pearl, a method integrating generative planning and policy optimization that achieved state-of-the-art performance in experiments on simulated and real-world advertising systems.
Auto-bidding serves as a critical tool for advertisers to improve their advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static offline dataset. To address this, we propose {AIGB-Pearl} (\emph{{P}lanning with {E}valu{A}tor via RL}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator for scoring generation quality and designing a provably sound KL-Lipschitz-constrained score maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm incorporating the synchronous coupling technique is further devised to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.