LGSep 22, 2022Code
Robust Collaborative Learning with Linear Gradient OverheadSadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta et al.
Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(α, λ)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.
LGMay 24, 2022
Byzantine Machine Learning Made Easy by Resilient Averaging of MomentumsSadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta et al.
Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: \emph{resilient averaging} at the server and \emph{distributed momentum} at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.
LGFeb 3, 2023
Fixing by Mixing: A Recipe for Optimal Byzantine ML under HeterogeneityYoussef Allouah, Sadegh Farhadkhani, Rachid Guerraoui et al.
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions.
LGSep 30, 2022
On the Impossible Safety of Large AI ModelsEl-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui et al.
Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.
LGFeb 9, 2023
On the Privacy-Robustness-Utility Trilemma in Distributed LearningYoussef Allouah, Rachid Guerraoui, Nirupam Gupta et al.
The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines' data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.
LGSep 11, 2023
SABLE: Secure And Byzantine robust LEarningAntoine Choffrut, Rachid Guerraoui, Rafael Pinot et al.
Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes poses a risk of corrupting the learning process by transmitting inaccurate information to the server. Secondly, a curious server may compromise the privacy of individual nodes, sometimes reconstructing the entirety of the nodes' data. Homomorphic encryption (HE) has emerged as a leading security measure to preserve privacy in distributed learning under non-Byzantine scenarios. However, the extensive computational demands of HE, particularly for high-dimensional ML models, have deterred attempts to design purely homomorphic operators for non-linear robust aggregators. This paper introduces SABLE, the first homomorphic and Byzantine robust distributed learning algorithm. SABLE leverages HTS, a novel and efficient homomorphic operator implementing the prominent coordinate-wise trimmed mean robust aggregator. Designing HTS enables us to implement HMED, a novel homomorphic median aggregator. Extensive experiments on standard ML tasks demonstrate that SABLE achieves practical execution times while maintaining an ML accuracy comparable to its non-private counterpart.
LGMay 30, 2025Code
ByzFL: Research Framework for Robust Federated LearningMarc González, Rachid Guerraoui, Rafael Pinot et al.
We present ByzFL, an open-source Python library for developing and benchmarking robust federated learning (FL) algorithms. ByzFL provides a unified and extensible framework that includes implementations of state-of-the-art robust aggregators, a suite of configurable attacks, and tools for simulating a variety of FL scenarios, including heterogeneous data distributions, multiple training algorithms, and adversarial threat models. The library enables systematic experimentation via a single JSON-based configuration file and includes built-in utilities for result visualization. Compatible with PyTorch tensors and NumPy arrays, ByzFL is designed to facilitate reproducible research and rapid prototyping of robust FL solutions. ByzFL is available at https://byzfl.epfl.ch/, with source code hosted on GitHub: https://github.com/LPD-EPFL/byzfl.
LGMay 23, 2024
Overcoming the Challenges of Batch Normalization in Federated LearningRachid Guerraoui, Rafael Pinot, Geovani Rizk et al.
Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We introduce in this paper Federated BatchNorm (FBN), a novel scheme that restores the benefits of batch normalization in federated learning. Essentially, FBN ensures that the batch normalization during training is consistent with what would be achieved in a centralized execution, hence preserving the distribution of the data, and providing running statistics that accurately approximate the global statistics. FBN thereby reduces the external covariate shift and matches the evaluation performance of the centralized setting. We also show that, with a slight increase in complexity, we can robustify FBN to mitigate erroneous statistics and potentially adversarial attacks.
LGMay 23, 2024
Adaptive Gradient Clipping for Robust Federated LearningYoussef Allouah, Rachid Guerraoui, Nirupam Gupta et al.
Robust federated learning aims to maintain reliable performance despite the presence of adversarial or misbehaving workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, existing static clipping strategies yield inconsistent results: enhancing robustness against some attacks while being ineffective or even detrimental against others. To address this limitation, we propose a principled adaptive clipping strategy, Adaptive Robust Clipping (ARC), which dynamically adjusts clipping thresholds based on the input gradients. We prove that ARC not only preserves the theoretical robustness guarantees of SOTA Robust-DGD methods but also provably improves asymptotic convergence when the model is well-initialized. Extensive experiments on benchmark image classification tasks confirm these theoretical insights, demonstrating that ARC significantly enhances robustness, particularly in highly heterogeneous and adversarial settings.
LGMay 3, 2025
Towards Trustworthy Federated Learning with Untrusted ParticipantsYoussef Allouah, Rachid Guerraoui, John Stephan
Resilience against malicious participants and data privacy are essential for trustworthy federated learning, yet achieving both with good utility typically requires the strong assumption of a trusted central server. This paper shows that a significantly weaker assumption suffices: each pair of participants shares a randomness seed unknown to others. In a setting where malicious participants may collude with an untrusted server, we propose CafCor, an algorithm that integrates robust gradient aggregation with correlated noise injection, using shared randomness between participants. We prove that CafCor achieves strong privacy-utility trade-offs, significantly outperforming local differential privacy (DP) methods, which do not make any trust assumption, while approaching central DP utility, where the server is fully trusted. Empirical results on standard benchmarks validate CafCor's practicality, showing that privacy and robustness can coexist in distributed systems without sacrificing utility or trusting the server.
LGDec 22, 2023
Balancing Privacy, Robustness, and Efficiency in Machine LearningYoussef Allouah, Rachid Guerraoui, John Stephan
This position paper argues that achieving robustness, privacy, and efficiency simultaneously in machine learning systems is infeasible under prevailing threat models. The tension between these goals arises not from algorithmic shortcomings but from structural limitations imposed by worst-case adversarial assumptions. We advocate for a systematic research agenda aimed at formalizing the robustness-privacy-efficiency trilemma, exploring how principled relaxations of threat models can unlock better trade-offs, and designing benchmarks that expose rather than obscure the compromises made. By shifting focus from aspirational universal guarantees to context-aware system design, the machine learning community can build models that are truly appropriate for real-world deployment.
LGOct 8, 2021
Combining Differential Privacy and Byzantine Resilience in Distributed SGDRachid Guerraoui, Nirupam Gupta, Rafael Pinot et al.
Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(α,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(α,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
LGFeb 16, 2021
Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?Rachid Guerraoui, Nirupam Gupta, Rafaël Pinot et al.
This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. A key finding of our analyses is that the classical approaches to these two (seemingly) orthogonal issues are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters of the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets; showing that it is impractical to ensure DP and Byzantine resilience simultaneously.