LG ITJul 13, 2012

The Price of Privacy in Untrusted Recommendation Engines

Siddhartha Banerjee, Nidhi Hegde, Laurent Massoulié

arXiv:1207.3269v219 citations

Originality Incremental advance

AI Analysis

This addresses privacy concerns in recommendation engines for users and platforms, offering theoretical insights into accuracy-privacy trade-offs, though it is incremental in extending privacy analysis to different information regimes.

The paper tackles the problem of learning item-clusters in recommender systems under local differential privacy, showing that sample complexity depends on whether users rate many or few items, with optimal algorithms developed for both regimes.

Recent increase in online privacy concerns prompts the following question: can a recommender system be accurate if users do not entrust it with their private data? To answer this, we study the problem of learning item-clusters under local differential privacy, a powerful, formal notion of data privacy. We develop bounds on the sample-complexity of learning item-clusters from privatized user inputs. Significantly, our results identify a sample-complexity separation between learning in an information-rich and an information-scarce regime, thereby highlighting the interaction between privacy and the amount of information (ratings) available to each user. In the information-rich regime, where each user rates at least a constant fraction of items, a spectral clustering approach is shown to achieve a sample-complexity lower bound derived from a simple information-theoretic argument based on Fano's inequality. However, the information-scarce regime, where each user rates only a vanishing fraction of items, is found to require a fundamentally different approach both for lower bounds and algorithms. To this end, we develop new techniques for bounding mutual information under a notion of channel-mismatch, and also propose a new algorithm, MaxSense, and show that it achieves optimal sample-complexity in this setting. The techniques we develop for bounding mutual information may be of broader interest. To illustrate this, we show their applicability to $(i)$ learning based on 1-bit sketches, and $(ii)$ adaptive learning, where queries can be adapted based on answers to past queries.

View on arXiv PDF

Similar