LG CR IRJan 29, 2022

Challenges and approaches to privacy preserving post-click conversion prediction

Conor O'Brien, Arvind Thiagarajan, Sourav Das, Rafael Barreto, Chetan Verma, Tim Hsu, James Neufield, Jonathan J Hunt

arXiv:2201.12666v110.413 citations

Originality Incremental advance

AI Analysis

This addresses privacy-preserving machine learning for advertisers facing regulatory and technical restrictions on user tracking, though it is incremental as it builds on existing conversion prediction tasks.

The paper tackles the problem of training conversion prediction models under privacy constraints, where individual user data is unavailable, by introducing a novel approach using post-ranking signals. The method outperforms models relying on opt-in data alone and significantly reduces degradation when no individual labels are available, as shown in offline experiments on real-world data.

Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area.

View on arXiv PDF

Similar