LG AI CROct 9, 2020

Bias and Variance of Post-processing in Differential Privacy

Keyu Zhu, Pascal Van Hentenryck, Ferdinando Fioretto

arXiv:2010.04327v113.246 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in understanding post-processing impacts for practitioners in privacy-sensitive domains like census data, though it is incremental as it builds on existing differential privacy principles.

The paper investigates the effects of post-processing on noise distribution in differential privacy, specifically analyzing bias and variance introduced by projecting privacy-preserving outputs onto convex feasible regions, with theoretical and empirical examination using census data release.

Post-processing immunity is a fundamental property of differential privacy: it enables the application of arbitrary data-independent transformations to the results of differentially private outputs without affecting their privacy guarantees. When query outputs must satisfy domain constraints, post-processing can be used to project the privacy-preserving outputs onto the feasible region. Moreover, when the feasible region is convex, a widely adopted class of post-processing steps is also guaranteed to improve accuracy. Post-processing has been applied successfully in many applications including census data-release, energy systems, and mobility. However, its effects on the noise distribution is poorly understood: It is often argued that post-processing may introduce bias and increase variance. This paper takes a first step towards understanding the properties of post-processing. It considers the release of census data and examines, both theoretically and empirically, the behavior of a widely adopted class of post-processing functions.

View on arXiv PDF

Similar