LGJul 28, 2022
Gender In Gender Out: A Closer Look at User Attributes in Context-Aware RecommendationManel Slokom, Özlem Özgöbek, Martha Larson
This paper studies user attributes in light of current concerns in the recommender system community: diversity, coverage, calibration, and data minimization. In experiments with a conventional context-aware recommender system that leverages side information, we show that user attributes do not always improve recommendation. Then, we demonstrate that user attributes can negatively impact diversity and coverage. Finally, we investigate the amount of information about users that ``survives'' from the training data into the recommendation lists produced by the recommender. This information is a weak signal that could in the future be exploited for calibration or studied further as a privacy leak.
LGOct 12, 2023
When Machine Learning Models Leak: An Exploration of Synthetic Training DataManel Slokom, Peter-Paul de Wolf, Martha Larson
We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years, i.e., a propensity-to-move classifier. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. The attack also assumes that the attacker has obtained the values of non-sensitive attributes for a certain number of target individuals. The objective of the attack is to infer the values of sensitive attributes for these target individuals. We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.
71.2IRApr 10Code
Beyond Centralization: User-Controlled Federated Recommendations in PracticeManel Slokom, Alejandro Bellogin
Recommendation systems typically require centralized user data, limiting user control and raising privacy concerns. Federated learning offers an alternative by keeping data on-device, but its impact on real user behavior remains largely unexplored. We present a live federated recommender system that allows users to control the recommendation objective while keeping their data local. In a 53-day deployment with 22 participants and a catalog of 8807 titles, users interacted with recommendations and switched between personalization and diversity-enhanced ranking. We find that users prefer personalization when given explicit choice (65.37\% vs.\ 62.07\% CTR), actively engage with control mechanisms (3.93/5 satisfaction; 248 settings changes), and develop an understanding of how their interactions affect recommendations through immediate feedback. Our results show that user control, privacy, and effective personalization can be combined in a working system. We demonstrate a practical approach to interactive, privacy-preserving recommendation. Code and demo materials are available at: https://github.com/SlokomManel/federated-recommendations-participants
IRJul 15, 2025
FedFlex: Federated Learning for Diverse Netflix RecommendationsSven Lankester, Gustavo de Carvalho Bertoli, Matias Vizcaino et al.
The drive for personalization in recommender systems creates a tension between user privacy and the risk of "filter bubbles". Although federated learning offers a promising paradigm for privacy-preserving recommendations, its impact on diversity remains unclear. We introduce FedFlex, a two-stage framework that combines local, on-device fine-tuning of matrix factorization models (SVD and BPR) with a lightweight Maximal Marginal Relevance (MMR) re-ranking step to promote diversity. We conducted the first live user study of a federated recommender, collecting behavioral data and feedback during a two-week online deployment. Our results show that FedFlex successfully engages users, with BPR outperforming SVD in click-through rate. Re-ranking with MMR consistently improved ranking quality (nDCG) across both models, with statistically significant gains, particularly for BPR. Diversity effects varied: MMR increased coverage for both models and improved intra-list diversity for BPR, but slightly reduced it for SVD, suggesting different interactions between personalization and diversification across models. Our exit questionnaire responses indicated that most users expressed no clear preference between re-ranked and unprocessed lists, implying that increased diversity did not substantially reduce user satisfaction.
IROct 7, 2021
Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender SystemsManel Slokom, Martha Larson
We present a case that the newly emerging field of synthetic data in the area of recommender systems should prioritize `doing data right'. We consider this catchphrase to have two aspects: First, we should not repeat the mistakes of the past, and, second, we should explore the full scope of opportunities presented by synthetic data as we move into the future. We argue that explicit attention to dataset design and description will help to avoid past mistakes with dataset bias and evaluation. In order to fully exploit the opportunities of synthetic data, we point out that researchers can investigate new areas such as using data synthesize to support reproducibility by making data open, as well as FAIR, and to push forward our understanding of data minimization.
IRAug 9, 2020
Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference HidingManel Slokom, Martha Larson, Alan Hanjalic
This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be used for recommendation but contains less specific information about individual user preferences. Synthetic data has the potential to be useful for companies, who are interested in releasing data to allow outside parties to develop new recommender algorithms, i.e., in the case of a recommender system challenge, and also reducing the risks associated with data misappropriation. Our experiments run a set of recommender system algorithms on our partially synthetic data sets as well as on the original data. The results show that the relative performance of the algorithms on the partially synthetic data reflects the relative performance on the original data. Further analysis demonstrates that properties of the original data are preserved under synthesis, but that for certain examples of attributes accessible in the original data are hidden in the synthesized data.