LGGTJun 2, 2022

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

arXiv:2206.01293v26 citationsh-index: 19
AI Analysis

This work addresses the challenge of mixed and delayed rewards in online advertising bidding for advertisers, representing an incremental improvement with a novel algorithm for a specific bottleneck.

The paper tackles the problem of optimizing bidding sequences for online advertising without prior knowledge of incrementality parameters, proposing a reinforcement learning algorithm that achieves a regret bound of Õ(H²√T) independent of the number of actions.

Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central object for advertisers in online advertising platforms. This paper investigates the problem of how an advertiser can learn to optimize the bidding sequence in an online manner \emph{without} knowing the incrementality parameters in advance. We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most $\widetilde{O}(H^2\sqrt{T})$, which depends on the number of rounds $H$ and number of episodes $T$, but does not depend on the number of actions (i.e., possible bids). A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality is \emph{mixed} and \emph{delayed}. To handle this difficulty we propose and analyze a novel pairwise moment-matching algorithm to learn the conversion incrementality, which we believe is of independent of interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes