LGMLMay 23, 2024

Private Regression via Data-Dependent Sufficient Statistic Perturbation

arXiv:2405.15002v14 citationsh-index: 4Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses privacy-preserving machine learning for regression tasks, offering incremental improvements over existing SSP approaches.

The paper tackles the problem of improving differentially private linear and logistic regression by introducing data-dependent sufficient statistic perturbation (SSP), which outperforms state-of-the-art data-independent SSP methods.

Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes