DS CR DB LGFeb 6, 2014

Dual Query: Practical Private Query Release for High Dimensional Data

Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, Zhiwei Steven Wu

arXiv:1402.1526v2113 citations

Originality Highly original

AI Analysis

This addresses the challenge of private data analysis for high-dimensional datasets, offering a practical solution with significant performance gains.

The paper tackles the problem of answering many queries on high-dimensional datasets with differential privacy, and presents an algorithm that efficiently and accurately handles millions of queries on datasets like Netflix with over 17,000 attributes, improving state-of-the-art by multiple orders of magnitude.

We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example, our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.

View on arXiv PDF

Similar