Representation Learning for High-Dimensional Data Collection under Local Differential Privacy
This addresses privacy-preserving data collection for industries handling sensitive high-dimensional data, offering an incremental improvement over existing LDP methods.
The paper tackles the problem of high-dimensional data collection under local differential privacy (LDP), where noise destroys utility, by introducing a novel approach that uses representation learning to add noise on low-dimensional manifolds and a denoising method for downstream models, resulting in significantly outperforming current state-of-the-art LDP mechanisms.
The collection of individuals' data has become commonplace in many industries. Local differential privacy (LDP) offers a rigorous approach to preserving privacy whereby the individual privatises their data locally, allowing only their perturbed datum to leave their possession. LDP thus provides a provable privacy guarantee to the individual against both adversaries and database administrators. Existing LDP mechanisms have successfully been applied to low-dimensional data, but in high dimensions the privacy-inducing noise largely destroys the utility of the data. In this work, our contributions are two-fold: first, by adapting state-of-the-art techniques from representation learning, we introduce a novel approach to learning LDP mechanisms. These mechanisms add noise to powerful representations on the low-dimensional manifold underlying the data, thereby overcoming the prohibitive noise requirements of LDP in high dimensions. Second, we introduce a novel denoising approach for downstream model learning. The training of performant machine learning models using collected LDP data is a common goal for data collectors, and downstream model performance forms a proxy for the LDP data utility. Our approach significantly outperforms current state-of-the-art LDP mechanisms.