Privacy Preserving Stream Analytics: The Marriage of Randomized Response and Approximate Computing
This addresses privacy concerns for users in stream processing applications, though it is incremental as it builds on existing techniques.
The paper tackles the problem of preserving user privacy while enabling high-utility, low-latency stream analytics by introducing PRIVAPPROX, a system that combines sampling and randomized response to achieve zero-knowledge privacy guarantees tighter than differential privacy and near real-time processing.
How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation, and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy guarantees for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error-estimation) and query execution budget; (iii) Latency: near real-time stream processing based on a scalable "synchronization-free" distributed architecture. The key idea behind our approach is to marry two existing techniques together: namely, sampling (used in the context of approximate computing) and randomized response (used in the context of privacy-preserving analytics). The resulting marriage is complementary - it achieves stronger privacy guarantees and also improves performance, a necessary ingredient for achieving low-latency stream analytics.