Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval
This work addresses the problem of user privacy in information retrieval for users who need to access data with some distortion tolerance, offering an incremental improvement over existing non-learning methods.
This paper explores the trade-off between download rate, distortion, and user privacy leakage in private information retrieval, showing it can be captured by an information-theoretical formulation for known data distributions. For unknown data statistics, they propose a deep learning framework using a generative adversarial network, which significantly outperforms a non-learning baseline on MNIST, CIFAR-10, and LSUN datasets.
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.