CROct 13, 2017

Differentially Private Query Learning: from Data Publishing to Model Publishing

Tianqing Zhu, Ping Xiong, Gang Li, Wanlei Zhou, Philip S. Yu

arXiv:1710.05095v14.51 citations

Originality Incremental advance

AI Analysis

This addresses privacy-preserving data publishing for big data applications, offering a novel approach to enhance utility while maintaining rigorous privacy guarantees, though it is incremental in adapting machine learning to an existing privacy framework.

The paper tackles the challenge of accurate non-interactive data publishing under differential privacy by shifting from releasing query results or synthetic datasets to publishing a prediction model trained on queries, which improves accuracy for both current and fresh queries. Experimental results show it outperforms traditional differential privacy methods in terms of Mean Absolute Value on large query sets.

With the development of Big Data and cloud data sharing, privacy preserving data publishing becomes one of the most important topics in the past decade. As one of the most influential privacy definitions, differential privacy provides a rigorous and provable privacy guarantee for data publishing. Differentially private interactive publishing achieves good performance in many applications; however, the curator has to release a large number of queries in a batch or a synthetic dataset in the Big Data era. To provide accurate non-interactive publishing results in the constraint of differential privacy, two challenges need to be tackled: one is how to decrease the correlation between large sets of queries, while the other is how to predict on fresh queries. Neither is easy to solve by the traditional differential privacy mechanism. This paper transfers the data publishing problem to a machine learning problem, in which queries are considered as training samples and a prediction model will be released rather than query results or synthetic datasets. When the model is published, it can be used to answer current submitted queries and predict results for fresh queries from the public. Compared with the traditional method, the proposed prediction model enhances the accuracy of query results for non-interactive publishing. Experimental results show that the proposed solution outperforms traditional differential privacy in terms of Mean Absolute Value on a large group of queries. This also suggests the learning model can successfully retain the utility of published queries while preserving privacy.

View on arXiv PDF

Similar