Rawad Bitar

LG
h-index25
19papers
78citations
Novelty57%
AI Score53

19 Papers

LGJun 25, 2023
Private Aggregation in Hierarchical Wireless Federated Learning with Partial and Full Collusion

Maximilian Egger, Christoph Hofmeister, Antonia Wachter-Zeh et al.

In federated learning, a federator coordinates the training of a model, e.g., a neural network, on privately owned data held by several participating clients. The gradient descent algorithm, a well-known and popular iterative optimization procedure, is run to train the model. Every client computes partial gradients based on their local data and sends them to the federator, which aggregates the results and updates the model. Privacy of the clients' data is a major concern. In fact, it is shown that observing the partial gradients can be enough to reveal the clients' data. Existing literature focuses on private aggregation schemes that tackle the privacy problem in federated learning in settings where all users are connected to each other and to the federator. In this paper, we consider a hierarchical wireless system architecture in which the clients are connected to base stations; the base stations are connected to the federator either directly or through relays. We examine settings with and without relays, and derive fundamental limits on the communication cost under information-theoretic privacy with different collusion assumptions. We introduce suitable private aggregation schemes tailored for these settings whose communication costs are multiplicative factors away from the derived bounds.

LGAug 4, 2022
Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Serge Kas Hanna, Rawad Bitar, Parimal Parag et al.

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive~$k$, i.e., varying $k$ throughout the runtime of the algorithm. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time that we derive. Then, we propose and implement an algorithm for adaptive distributed SGD that is based on a statistical heuristic. Our results show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations. Moreover, the results also show that the adaptive version is communication-efficient, where the amount of communication required between the master and the workers is less than that of non-adaptive versions.

LGJul 16, 2024
Self-Regulating Random Walks for Resilient Decentralized Learning on Graphs

Maximilian Egger, Rawad Bitar, Ghadir Ayache et al.

Consider the setting of multiple random walks (RWs) on a graph executing a certain computational task. For instance, in decentralized learning via RWs, a model is updated at each iteration based on the local data of the visited node and then passed to a randomly chosen neighbor. RWs can fail due to node or link failures. The goal is to maintain a desired number of RWs to ensure failure resilience. Achieving this is challenging due to the lack of a central entity to track which RWs have failed to replace them with new ones by forking (duplicating) surviving ones. Without duplications, the number of RWs will eventually go to zero, causing a catastrophic failure of the system. We propose two decentralized algorithms called DecAFork and DecAFork+ that can maintain the number of RWs in the graph around a desired value even in the presence of arbitrary RW failures. Nodes continuously estimate the number of surviving RWs by estimating their return time distribution and fork the RWs when failures are likely to happen. DecAFork+ additionally allows terminations to avoid overloading the network by forking too many RWs. We present extensive numerical simulations that show the performance of DecAFork and DecAFork+ regarding fast detection and reaction to failures compared to a baseline, and establish theoretical guarantees on the performance of both algorithms.

ITJul 16, 2024
Scalable and Reliable Over-the-Air Federated Edge Learning

Maximilian Egger, Christoph Hofmeister, Cem Kaya et al.

Federated edge learning (FEEL) has emerged as a core paradigm for large-scale optimization. However, FEEL still suffers from a communication bottleneck due to the transmission of high-dimensional model updates from the clients to the federator. Over-the-air computation (AirComp) leverages the additive property of multiple-access channels by aggregating the clients' updates over the channel to save communication resources. While analog uncoded transmission can benefit from the increased signal-to-noise ratio (SNR) due to the simultaneous transmission of many clients, potential errors may severely harm the learning process for small SNRs. To alleviate this problem, channel coding approaches were recently proposed for AirComp in FEEL. However, their error-correction capability degrades with an increasing number of clients. We propose a digital lattice-based code construction with constant error-correction capabilities in the number of clients, and compare to nested-lattice codes, well-known for their optimal rate and power efficiency in the point-to-point AWGN channel.

DCApr 17, 2023
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

Maximilian Egger, Serge Kas Hanna, Rawad Bitar

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. Previous works proposed to adapt the number of workers to wait for as the algorithm evolves to optimize the speed of convergence. In contrast, we model the communication and computation times using independent random variables. Considering this model, we construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm. Consequently, we improve the convergence speed of distributed SGD while significantly reducing the computation load, at the expense of a slight increase in communication load.

47.1ITMay 11
Random Access Expectation in DNA Storage and Fountain Codes

Christoph Hofmeister, Rawad Bitar, Eitan Yaakobi

Motivated by DNA data storage, we study the expected number of coded symbols drawn from a linear code until a desired information symbol can be decoded - the random access expectation. We focus on generator matrices with a type of symmetry, conjectured in prior work to be optimal, which we call fully symmetric. We point out an equivalence between binary fully symmetric codes and LT codes. Using this observation, we analyze the random access expectation of binary fully symmetric codes under a peeling decoder, in the large blocklength limit. Under these assumptions, the random access expectation, normalized by the number of information symbols, is at least π/4 {\approx} 0.7854, while a value of {\approx} 0.7869 is achievable.

ITMay 14, 2024
Byzantine-Resilient Secure Aggregation for Federated Learning Without Privacy Compromises

Yue Xia, Christoph Hofmeister, Maximilian Egger et al.

Federated learning (FL) shows great promise in large scale machine learning, but brings new risks in terms of privacy and security. We propose ByITFL, a novel scheme for FL that provides resilience against Byzantine users while keeping the users' data private from the federator and private from other users. The scheme builds on the preexisting non-private FLTrust scheme, which tolerates malicious users through trust scores (TS) that attenuate or amplify the users' gradients. The trust scores are based on the ReLU function, which we approximate by a polynomial. The distributed and privacy-preserving computation in ByITFL is designed using a combination of Lagrange coded computing, verifiable secret sharing and re-randomization steps. ByITFL is the first Byzantine resilient scheme for FL with full information-theoretic privacy.

LGJan 31, 2025
Byzantine-Resilient Zero-Order Optimization for Communication-Efficient Heterogeneous Federated Learning

Maximilian Egger, Mayank Bakshi, Rawad Bitar

We introduce CyBeR-0, a Byzantine-resilient federated zero-order optimization method that is robust under Byzantine attacks and provides significant savings in uplink and downlink communication costs. We introduce transformed robust aggregation to give convergence guarantees for general non-convex objectives under client data heterogeneity. Empirical evaluations for standard learning tasks and fine-tuning large language models show that CyBeR-0 exhibits stable performance with only a few scalars per-round communication cost and reduced memory requirements.

CRApr 29, 2025
Federated One-Shot Learning with Data Privacy and Objective-Hiding

Maximilian Egger, Rüdiger Urbanke, Rawad Bitar

Privacy in federated learning is crucial, encompassing two key aspects: safeguarding the privacy of clients' data and maintaining the privacy of the federator's objective from the clients. While the first aspect has been extensively studied, the second has received much less attention. We present a novel approach that addresses both concerns simultaneously, drawing inspiration from techniques in knowledge distillation and private information retrieval to provide strong information-theoretic privacy guarantees. Traditional private function computation methods could be used here; however, they are typically limited to linear or polynomial functions. To overcome these constraints, our approach unfolds in three stages. In stage 0, clients perform the necessary computations locally. In stage 1, these results are shared among the clients, and in stage 2, the federator retrieves its desired objective without compromising the privacy of the clients' data. The crux of the method is a carefully designed protocol that combines secret-sharing-based multi-party computation and a graph-based private information retrieval scheme. We show that our method outperforms existing tools from the literature when properly adapted to this setting.

LGJan 31, 2025
BICompFL: Stochastic Federated Learning with Bi-Directional Compression

Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh et al.

We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather than deterministic parameters. Stochastic FL offers a principled approach to compression, and has been shown to reduce the communication load under perfect downlink transmission from the federator to the clients. However, in practice, both the uplink and downlink communications are constrained. We show that bi-directional compression for stochastic FL has inherent challenges, which we address by introducing BICompFL. Our BICompFL is experimentally shown to reduce the communication cost by an order of magnitude compared to multiple benchmarks, while maintaining state-of-the-art accuracies. Theoretically, we study the communication cost of BICompFL through a new analysis of an importance-sampling based technique, which exposes the interplay between uplink and downlink communication costs.

LGSep 11, 2025
ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning

Sena Ergisi, Luis Maßny, Rawad Bitar

Federated Learning (FL) emerged as a widely studied paradigm for distributed learning. Despite its many advantages, FL remains vulnerable to adversarial attacks, especially under data heterogeneity. We propose a new Byzantine-robust FL algorithm called ProDiGy. The key novelty lies in evaluating the client gradients using a joint dual scoring system based on the gradients' proximity and dissimilarity. We demonstrate through extensive numerical experiments that ProDiGy outperforms existing defenses in various scenarios. In particular, when the clients' data do not follow an IID distribution, while other defense mechanisms fail, ProDiGy maintains strong defense capabilities and model accuracy. These findings highlight the effectiveness of a dual perspective approach that promotes natural similarity among honest clients while detecting suspicious uniformity as a potential indicator of an attack.

LGAug 18, 2025
Fed-DPRoC:Communication-Efficient Differentially Private and Robust Federated Learning

Yue Xia, Tayyebeh Jahani-Nezhad, Rawad Bitar

We propose Fed-DPRoC, a novel federated learning framework that simultaneously ensures differential privacy (DP), Byzantine robustness, and communication efficiency. We introduce the concept of robust-compatible compression, which enables users to compress DP-protected updates while maintaining the robustness of the aggregation rule. We instantiate our framework as RobAJoL, combining the Johnson-Lindenstrauss (JL) transform for compression with robust averaging for robust aggregation. We theoretically prove the compatibility of JL transform with robust averaging and show that RobAJoL preserves robustness guarantees, ensures DP, and reduces communication cost. Experiments on CIFAR-10 and Fashion MNIST validate our theoretical claims and demonstrate that RobAJoL outperforms existing methods in terms of robustness and utility under different Byzantine attacks.

LGJun 16, 2025
Perfect Privacy for Discriminator-Based Byzantine-Resilient Federated Learning

Yue Xia, Christoph Hofmeister, Maximilian Egger et al.

Federated learning (FL) shows great promise in large-scale machine learning but introduces new privacy and security challenges. We propose ByITFL and LoByITFL, two novel FL schemes that enhance resilience against Byzantine users while keeping the users' data private from eavesdroppers. To ensure privacy and Byzantine resilience, our schemes build on having a small representative dataset available to the federator and crafting a discriminator function allowing the mitigation of corrupt users' contributions. ByITFL employs Lagrange coded computing and re-randomization, making it the first Byzantine-resilient FL scheme with perfect Information-Theoretic (IT) privacy, though at the cost of a significant communication overhead. LoByITFL, on the other hand, achieves Byzantine resilience and IT privacy at a significantly reduced communication cost, but requires a Trusted Third Party, used only in a one-time initialization phase before training. We provide theoretical guarantees on privacy and Byzantine resilience, along with convergence guarantees and experimental results validating our findings.

LGJun 11, 2025
Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning

Maximilian Egger, Rawad Bitar

Ensuring resilience to Byzantine clients while maintaining the privacy of the clients' data is a fundamental challenge in federated learning (FL). When the clients' data is homogeneous, suitable countermeasures were studied from an information-theoretic perspective utilizing secure aggregation techniques while ensuring robust aggregation of the clients' gradients. However, the countermeasures used fail when the clients' data is heterogeneous. Suitable pre-processing techniques, such as nearest neighbor mixing, were recently shown to enhance the performance of those countermeasures in the heterogeneous setting. Nevertheless, those pre-processing techniques cannot be applied with the introduced privacy-preserving mechanisms. We propose a multi-stage method encompassing a careful co-design of verifiable secret sharing, secure aggregation, and a tailored symmetric private information retrieval scheme to achieve information-theoretic privacy guarantees and Byzantine resilience under data heterogeneity. We evaluate the effectiveness of our scheme on a variety of attacks and show how it outperforms the previously known techniques. Since the communication overhead of secure aggregation is non-negligible, we investigate the interplay with zero-order estimation methods that reduce the communication cost in state-of-the-art FL tasks and thereby make private aggregation scalable.

LGMay 11, 2025
Efficient Machine Unlearning by Model Splitting and Core Sample Selection

Maximilian Egger, Rawad Bitar, Rüdiger Urbanke

Machine unlearning is essential for meeting legal obligations such as the right to be forgotten, which requires the removal of specific data from machine learning models upon request. While several approaches to unlearning have been proposed, existing solutions often struggle with efficiency and, more critically, with the verification of unlearning - particularly in the case of weak unlearning guarantees, where verification remains an open challenge. We introduce a generalized variant of the standard unlearning metric that enables more efficient and precise unlearning strategies. We also present an unlearning-aware training procedure that, in many cases, allows for exact unlearning. We term our approach MaxRR. When exact unlearning is not feasible, MaxRR still supports efficient unlearning with properties closely matching those achieved through full retraining.

CRMay 11, 2025
Source Anonymity for Private Random Walk Decentralized Learning

Maximilian Egger, Svenja Lage, Rawad Bitar et al.

This paper considers random walk-based decentralized learning, where at each iteration of the learning process, one user updates the model and sends it to a randomly chosen neighbor until a convergence criterion is met. Preserving data privacy is a central concern and open problem in decentralized learning. We propose a privacy-preserving algorithm based on public-key cryptography and anonymization. In this algorithm, the user updates the model and encrypts the result using a distant user's public key. The encrypted result is then transmitted through the network with the goal of reaching that specific user. The key idea is to hide the source's identity so that, when the destination user decrypts the result, it does not know who the source was. The challenge is to design a network-dependent probability distribution (at the source) over the potential destinations such that, from the receiver's perspective, all users have a similar likelihood of being the source. We introduce the problem and construct a scheme that provides anonymity with theoretical guarantees. We focus on random regular graphs to establish rigorous guarantees.

LGJun 20, 2024
Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization

Afonso de Sá Delgado Neto, Maximilian Egger, Mayank Bakshi et al.

We introduce CYBER-0, the first zero-order optimization algorithm for memory-and-communication efficient Federated Learning, resilient to Byzantine faults. We show through extensive numerical experiments on the MNIST dataset and finetuning RoBERTa-Large that CYBER-0 outperforms state-of-the-art algorithms in terms of communication and memory efficiency while reaching similar accuracy. We provide theoretical guarantees on its convergence for convex loss functions.

ITFeb 16, 2022
Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits

Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh et al.

We consider the distributed SGD problem, where a main node distributes gradient calculations among $n$ workers. By assigning tasks to all the workers and waiting only for the $k$ fastest ones, the main node can trade-off the algorithm's error with its runtime by gradually increasing $k$ as the algorithm evolves. However, this strategy, referred to as adaptive $k$-sync, neglects the cost of unused computations and of communicating models to workers that reveal a straggling behavior. We propose a cost-efficient scheme that assigns tasks only to $k$ workers, and gradually increases $k$. We introduce the use of a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations. Assuming workers with exponentially distributed response times parameterized by different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Furthermore, we propose and analyze a strategy applicable to a large class of response time distributions. Compared to adaptive $k$-sync, our scheme achieves significantly lower errors with the same computational efforts and less downlink communication while being inferior in terms of speed.

LGFeb 25, 2020
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Serge Kas Hanna, Rawad Bitar, Parimal Parag et al.

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive $k$. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.