Private Learning with Public Features
This work addresses privacy challenges in personalization tasks like recommendation and ad prediction, where sensitive individual features must be protected while leveraging public item features, offering a novel approach that improves utility over existing methods.
The paper tackles the problem of private learning when data consists of both private and public features, common in tasks like recommendation systems, by developing algorithms that protect only certain sufficient statistics instead of adding noise to gradients. It achieves state-of-the-art results on two standard private recommendation benchmarks, showing utility improvements, particularly for linear regression.
We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.