CR LGNov 25, 2022

M$^2$M: A general method to perform various data analysis tasks from a differentially private sketch

Florimond Houssiau, Vincent Schellekens, Antoine Chatalic, Shreyas Kumar Annamraju, Yves-Alexandre de Montjoye

arXiv:2211.14062v12.9h-index: 18

Originality Incremental advance

AI Analysis

This addresses the problem for data analysts needing to conduct extensive data exploration on sensitive datasets while maintaining privacy, though it is incremental as it builds on existing private sketching techniques.

The paper tackles the challenge of performing multiple data analysis tasks under differential privacy without incurring additional privacy loss by introducing the M$^2$M method, which enables tasks like estimating moments, covariance, histograms, and regression models from a single private sketch, validated on artificial and real-world data with reliable results.

Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment (M$^2$M) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.

View on arXiv PDF

Similar