LG GTJun 2, 2022

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Ashwinkumar Badanidiyuru, Zhe Feng, Tianxi Li, Haifeng Xu

arXiv:2206.01293v27.86 citationsh-index: 19

Originality Incremental advance

AI Analysis

This work addresses the challenge of mixed and delayed rewards in online advertising bidding for advertisers, representing an incremental improvement with a novel algorithm for a specific bottleneck.

The paper tackles the problem of optimizing bidding sequences for online advertising without prior knowledge of incrementality parameters, proposing a reinforcement learning algorithm that achieves a regret bound of Õ(H²√T) independent of the number of actions.

Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central object for advertisers in online advertising platforms. This paper investigates the problem of how an advertiser can learn to optimize the bidding sequence in an online manner \emph{without} knowing the incrementality parameters in advance. We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most $\widetilde{O}(H^2\sqrt{T})$, which depends on the number of rounds $H$ and number of episodes $T$, but does not depend on the number of actions (i.e., possible bids). A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality is \emph{mixed} and \emph{delayed}. To handle this difficulty we propose and analyze a novel pairwise moment-matching algorithm to learn the conversion incrementality, which we believe is of independent of interest.

View on arXiv PDF

Similar