CRAILGMLMar 19, 2024

Provable Privacy with Non-Private Pre-Processing

arXiv:2403.13041v45 citationsICML
Originality Incremental advance
AI Analysis

This work addresses a critical gap in privacy accounting for machine learning practitioners, offering a practical solution to ensure provable privacy in real-world pipelines, though it is incremental as it builds on existing DP methods.

The paper tackles the overlooked privacy cost of non-private data-dependent pre-processing in differentially private machine learning pipelines by proposing a framework to evaluate this additional cost, establishing upper bounds on overall privacy guarantees using Smooth DP and bounded sensitivity, and providing explicit guarantees for algorithms like data imputation and PCA.

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes