CRLGMay 24, 2020

Continuous Release of Data Streams under both Centralized and Local Differential Privacy

arXiv:2005.11753v296 citations
AI Analysis

This addresses the challenge of continuous data release for applications like IoT or finance, offering a novel solution under local differential privacy, though it is incremental in improving existing DP methods.

The paper tackles the problem of privately publishing real-valued data streams under differential privacy by developing a framework that estimates a threshold to truncate large values and adds calibrated noise, resulting in a mechanism that outperforms state-of-the-art methods by 6-10 orders of magnitude in utility for range queries.

In this paper, we study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value can be quite large; thus it is necessary to estimate a threshold so that numbers above it are truncated to reduce the amount of noise that is required to all the data. The estimation must be done based on the data in a private fashion. We develop such a method that uses the Exponential Mechanism with a quality function that approximates well the utility goal while maintaining a low sensitivity. Given the threshold, we then propose a novel online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps into a framework for private publishing of stream data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noises to the stream, and a smoother that improves the result using post-processing. Within our framework, we design an algorithm satisfying the more stringent setting of DP called local DP (LDP). To our knowledge, this is the first LDP algorithm for publishing streaming data. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6-10 orders of magnitude in terms of utility (measured by the mean squared error of answering a random range query).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes