Collecting Telemetry Data Privately
This addresses privacy risks for users in software telemetry collection, offering a practical solution for repeated data scenarios, though it builds incrementally on existing LDP frameworks.
The paper tackles the problem of preserving privacy in repeated telemetry data collection, such as daily app usage statistics, by developing new locally differentially private (LDP) mechanisms that maintain formal privacy guarantees over time and achieve accuracy comparable to single-round methods, with deployment across millions of devices by Microsoft.
The collection and analysis of telemetry data from users' devices is routinely performed by many software companies. Telemetry collection leads to improved user experience but poses significant risks to users' privacy. Locally differentially private (LDP) algorithms have recently emerged as the main tool that allows data collectors to estimate various population statistics, while preserving privacy. The guarantees provided by such algorithms are typically very strong for a single round of telemetry collection, but degrade rapidly when telemetry is collected regularly. In particular, existing LDP algorithms are not suitable for repeated collection of counter data such as daily app usage statistics. In this paper, we develop new LDP mechanisms geared towards repeated collection of counter data, with formal privacy guarantees even after being executed for an arbitrarily long period of time. For two basic analytical tasks, mean estimation and histogram estimation, our LDP mechanisms for repeated data collection provide estimates with comparable or even the same accuracy as existing single-round LDP collection mechanisms. We conduct empirical evaluation on real-world counter datasets to verify our theoretical results. Our mechanisms have been deployed by Microsoft to collect telemetry across millions of devices.