LG CR IRJan 26, 2024

Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang

arXiv:2401.15246v110.47 citationsAdKDD@KDD

Originality Incremental advance

AI Analysis

This work addresses privacy challenges in digital advertising by enabling more efficient use of semi-sensitive data, though it is incremental as it builds on existing DP frameworks.

The paper tackles the problem of training differentially private machine learning models when some features are known to attackers and others are private, introducing a new algorithm that outperforms baselines like DP-SGD and label DP methods in utility on real ads datasets.

Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).

View on arXiv PDF

Similar