Capturing Delayed Feedback in Conversion Rate Prediction via Elapsed-Time Sampling
This work provides a method to improve the accuracy of conversion rate prediction for online advertising platforms by addressing the trade-off between label accuracy and data freshness due to delayed feedback.
This paper addresses the delayed feedback problem in conversion rate (CVR) prediction for digital advertising, where conversions occur after a user click, leading to inaccurate labeling. The authors propose Elapsed-Time Sampling Delayed Feedback Model (ES-DFM) to model the relationship between observed and true conversion distributions, optimizing the expectation of true conversion distribution via importance sampling. Experiments on public and private industrial datasets show that ES-DFM consistently outperforms previous state-of-the-art methods.
Conversion rate (CVR) prediction is one of the most critical tasks for digital display advertising. Commercial systems often require to update models in an online learning manner to catch up with the evolving data distribution. However, conversions usually do not happen immediately after a user click. This may result in inaccurate labeling, which is called delayed feedback problem. In previous studies, delayed feedback problem is handled either by waiting positive label for a long period of time, or by consuming the negative sample on its arrival and then insert a positive duplicate when a conversion happens later. Indeed, there is a trade-off between waiting for more accurate labels and utilizing fresh data, which is not considered in existing works. To strike a balance in this trade-off, we propose Elapsed-Time Sampling Delayed Feedback Model (ES-DFM), which models the relationship between the observed conversion distribution and the true conversion distribution. Then we optimize the expectation of true conversion distribution via importance sampling under the elapsed-time sampling distribution. We further estimate the importance weight for each instance, which is used as the weight of loss function in CVR prediction. To demonstrate the effectiveness of ES-DFM, we conduct extensive experiments on a public data and a private industrial dataset. Experimental results confirm that our method consistently outperforms the previous state-of-the-art results.