Regression with Label Differential Privacy
This work addresses privacy-preserving regression for applications where label data must be protected, representing an incremental improvement in label DP mechanisms.
The paper tackles training regression models with label differential privacy guarantees by deriving an optimal randomization mechanism based on a prior label distribution, which takes the form of a randomized response on bins, and demonstrates its efficacy through experiments on multiple datasets.
We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.