IRJun 15, 2023
Inferring Communities of Interest in Collaborative Learning-based Recommender SystemsYacine Belal, Sonia Ben Mokhtar, Mohamed Maouche et al.
Collaborative-learning-based recommender systems, such as those employing Federated Learning (FL) and Gossip Learning (GL), allow users to train models while keeping their history of liked items on their devices. While these methods were seen as promising for enhancing privacy, recent research has shown that collaborative learning can be vulnerable to various privacy attacks. In this paper, we propose a novel attack called Community Inference Attack (CIA), which enables an adversary to identify community members based on a set of target items. What sets CIA apart is its efficiency: it operates at low computational cost by eliminating the need for training surrogate models. Instead, it uses a comparison-based approach, inferring sensitive information by comparing users' models rather than targeting any specific individual model. To evaluate the effectiveness of CIA, we conduct experiments on three real-world recommendation datasets using two recommendation models under both Federated and Gossip-like settings. The results demonstrate that CIA can be up to 10 times more accurate than random guessing. Additionally, we evaluate two mitigation strategies: Differentially Private Stochastic Gradient Descent (DP-SGD) and a Share less policy, which involves sharing fewer, less sensitive model parameters. Our findings suggest that the Share less strategy offers a better privacy-utility trade-off, especially in GL.
LGNov 4, 2024
Differentially private and decentralized randomized power methodJulien Nicolas, César Sabater, Mohamed Maouche et al.
The randomized power method has gained significant interest due to its simplicity and efficient handling of large-scale spectral analysis and recommendation tasks. However, its application to large datasets containing personal information (e.g., web interactions, search history, personal tastes) raises critical privacy problems. This paper addresses these issues by proposing enhanced privacy-preserving variants of the method. First, we propose a variant that reduces the amount of the noise required in current techniques to achieve Differential Privacy (DP). More precisely, we refine the privacy analysis so that the Gaussian noise variance no longer grows linearly with the target rank, achieving the same DP guarantees with strictly less noise. Second, we adapt our method to a decentralized framework in which data is distributed among multiple users. The decentralized protocol strengthens privacy guarantees with no accuracy penalty and a low computational and communication overhead. Our results include the provision of tighter convergence bounds for both the centralized and decentralized versions, and an empirical comparison with previous work using real recommendation datasets.
LGJul 4, 2025
Communication Efficient, Differentially Private Distributed Optimization using Correlation-Aware SketchingJulien Nicolas, Mohamed Maouche, Sonia Ben Mokhtar et al.
Federated learning with differential privacy suffers from two major costs: each client must transmit $d$-dimensional gradients every round, and the magnitude of DP noise grows with $d$. Yet empirical studies show that gradient updates exhibit strong temporal correlations and lie in a $k$-dimensional subspace with $k \ll d$. Motivated by this, we introduce DOME, a decentralized DP optimization framework in which each client maintains a compact sketch to project gradients into $\mathbb{R}^k$ before privatization and Secure Aggregation. This reduces per-round communication from order $d$ to order $k$ and moves towards a gradient approximation mean-squared error of $σ^2 k$. To allow the sketch to span new directions and prevent it from collapsing onto historical gradients, we augment it with random probes orthogonal to historical directions. We prove that our overall protocol satisfies $(ε,δ)$-Differential Privacy.
LGApr 24, 2025
GRANITE : a Byzantine-Resilient Dynamic Gossip Learning FrameworkYacine Belal, Mohamed Maouche, Sonia Ben Mokhtar et al.
Gossip Learning (GL) is a decentralized learning paradigm where users iteratively exchange and aggregate models with a small set of neighboring peers. Recent GL approaches rely on dynamic communication graphs built and maintained using Random Peer Sampling (RPS) protocols. Thanks to graph dynamics, GL can achieve fast convergence even over extremely sparse topologies. However, the robustness of GL over dy- namic graphs to Byzantine (model poisoning) attacks remains unaddressed especially when Byzantine nodes attack the RPS protocol to scale up model poisoning. We address this issue by introducing GRANITE, a framework for robust learning over sparse, dynamic graphs in the presence of a fraction of Byzantine nodes. GRANITE relies on two key components (i) a History-aware Byzantine-resilient Peer Sampling protocol (HaPS), which tracks previously encountered identifiers to reduce adversarial influence over time, and (ii) an Adaptive Probabilistic Threshold (APT), which leverages an estimate of Byzantine presence to set aggregation thresholds with formal guarantees. Empirical results confirm that GRANITE maintains convergence with up to 30% Byzantine nodes, improves learning speed via adaptive filtering of poisoned models and obtains these results in up to 9 times sparser graphs than dictated by current theory.
LGDec 17, 2024
Exposing the Vulnerability of Decentralized Learning to Membership Inference Attacks Through the Lens of Graph MixingOusmane Touat, Jezekael Brunon, Yacine Belal et al.
The primary promise of decentralized learning is to allow users to engage in the training of machine learning models in a collaborative manner while keeping their data on their premises and without relying on any central entity. However, this paradigm necessitates the exchange of model parameters or gradients between peers. Such exchanges can be exploited to infer sensitive information about training data, which is achieved through privacy attacks (e.g., Membership Inference Attacks -- MIA). In order to devise effective defense mechanisms, it is important to understand the factors that increase/reduce the vulnerability of a given decentralized learning architecture to MIA. In this study, we extensively explore the vulnerability to MIA of various decentralized learning architectures by varying the graph structure (e.g., number of neighbors), the graph dynamics, and the aggregation strategy, across diverse datasets and data distributions. Our key finding, which to the best of our knowledge we are the first to report, is that the vulnerability to MIA is heavily correlated to (i) the local model mixing strategy performed by each node upon reception of models from neighboring nodes and (ii) the global mixing properties of the communication graph. We illustrate these results experimentally using four datasets and by theoretically analyzing the mixing properties of various decentralized architectures. We also empirically show that enhancing mixing properties is highly beneficial when combined with other privacy-preserving techniques such as Differential Privacy. Our paper draws a set of lessons learned for devising decentralized learning systems that reduce by design the vulnerability to MIA.
SDFeb 23, 2022
Differentially Private Speaker AnonymizationAli Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet et al.
Sharing real-world speech utterances is key to the training and deployment of voice-based services. However, it also raises privacy risks as speech contains a wealth of personal data. Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact. State-of-the-art techniques operate by disentangling the speaker information (represented via a speaker embedding) from these attributes and re-synthesizing speech based on the speaker embedding of another speaker. Prior research in the privacy community has shown that anonymization often provides brittle privacy protection, even less so any provable guarantee. In this work, we show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information. We remove speaker information from these attributes by introducing differentially private feature extractors based on an autoencoder and an automatic speech recognizer, respectively, trained using noise layers. We plug these extractors in the state-of-the-art anonymization pipeline and generate, for the first time, private speech utterances with a provable upper bound on the speaker information they contain. We evaluate empirically the privacy and utility resulting from our differentially private speaker anonymization approach on the LibriSpeech data set. Experimental results show that the generated utterances retain very high utility for automatic speech recognition training and inference, while being much better protected against strong adversaries who leverage the full knowledge of the anonymization process to try to infer the speaker identity.
CLSep 1, 2021
The VoicePrivacy 2020 Challenge: Results and findingsNatalia Tomashenko, Xin Wang, Emmanuel Vincent et al.
This paper presents the results and analyses stemming from the first VoicePrivacy 2020 Challenge which focuses on developing anonymization solutions for speech technology. We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results. In particular, we describe the voice anonymization task and datasets used for system development and evaluation. Also, we present different attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and provide a summary description of the anonymization systems developed by the challenge participants. We report objective and subjective evaluation results for baseline and submitted systems. In addition, we present experimental results for alternative privacy metrics and attack models developed as a part of the post-evaluation analysis. Finally, we summarize our insights and observations that will influence the design of the next VoicePrivacy challenge edition and some directions for future voice anonymization research.
ASMay 18, 2020
Design Choices for X-vector Based Speaker AnonymizationBrij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang et al.
The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker. In this paper, we present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender selection. To assess the strength of anonymization achieved, we consider attackers using an x-vector based speaker verification system who may use original or anonymized speech for enrollment, depending on their knowledge of the anonymization scheme. The Equal Error Rate (EER) achieved by the attackers and the decoding Word Error Rate (WER) over anonymized data are reported as the measures of privacy and utility. Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.