LG MLMay 23, 2024

Private Regression via Data-Dependent Sufficient Statistic Perturbation

arXiv:2405.15002v110.44 citationsh-index: 4Has CodeTrans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This work addresses privacy-preserving machine learning for regression tasks, offering incremental improvements over existing SSP approaches.

The paper tackles the problem of improving differentially private linear and logistic regression by introducing data-dependent sufficient statistic perturbation (SSP), which outperforms state-of-the-art data-independent SSP methods.

Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.

View on arXiv PDF Code

Similar