Salim El Rouayheb

LG
h-index29
14papers
8,788citations
Novelty49%
AI Score55

14 Papers

LGJun 1, 2022
Walk for Learning: A Random Walk Approach for Federated Learning from Heterogeneous Data

Ghadir Ayache, Venkat Dassari, Salim El Rouayheb

We consider the problem of a Parameter Server (PS) that wishes to learn a model that fits data distributed on the nodes of a graph. We focus on Federated Learning (FL) as a canonical application. One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server. A popular solution in the literature is to allow each node to do several local updates on the model in each iteration before sending it back to the PS. While this mitigates the communication bottleneck, the statistical heterogeneity of the data owned by the different nodes has proven to delay convergence and bias the model. In this work, we study random walk (RW) learning algorithms for tackling the communication and data heterogeneity problems. The main idea is to leverage available direct connections among the nodes themselves, which are typically "cheaper" than the communication to the PS. In a random walk, the model is thought of as a "baton" that is passed from a node to one of its neighbors after being updated in each iteration. The challenge in designing the RW is the data heterogeneity and the uncertainty about the data distributions. Ideally, we would want to visit more often nodes that hold more informative data. We cast this problem as a sleeping multi-armed bandit (MAB) to design a near-optimal node sampling strategy that achieves variance-reduced gradient estimates and approaches sub-linearly the optimal sampling strategy. Based on this framework, we present an adaptive random walk learning algorithm. We provide theoretical guarantees on its convergence. Our numerical results validate our theoretical findings and show that our algorithm outperforms existing random walk algorithms.

ITApr 21
Perfect Privacy and Strong Stationary Times for Markovian Sources

Fangwei Ye, Zonghong Liu, Parimal Parag et al.

We consider the problem of sharing correlated data under a perfect information-theoretic privacy constraint. We focus on redaction (erasure) mechanisms, in which data are either withheld or released unchanged, and measure utility by the average cardinality of the released set, equivalently, the expected Hamming distortion. Assuming the data are generated by a finite time-homogeneous Markov chain, we study the protection of the initial state while maximizing the amount of shared data. We establish a connection between perfect privacy and window-based redaction schemes, showing that erasing data up to a strong stationary time preserves privacy under suitable conditions. We further study an optimal sequential redaction mechanism and prove that it admits an equivalent window interpretation. Interestingly, we show that both mechanisms achieve the optimal distortion while redacting only a constant average number of data points, independent of the data length~$N$.

LGJul 30, 2024
The Entrapment Problem in Random Walk Decentralized Learning

Zonghong Liu, Salim El Rouayheb, Matthew Dwyer

This paper explores decentralized learning in a graph-based setting, where data is distributed across nodes. We investigate a decentralized SGD algorithm that utilizes a random walk to update a global model based on local data. Our focus is on designing the transition probability matrix to speed up convergence. While importance sampling can enhance centralized learning, its decentralized counterpart, using the Metropolis-Hastings (MH) algorithm, can lead to the entrapment problem, where the random walk becomes stuck at certain nodes, slowing convergence. To address this, we propose the Metropolis-Hastings with Lévy Jumps (MHLJ) algorithm, which incorporates random perturbations (jumps) to overcome entrapment. We theoretically establish the convergence rate and error gap of MHLJ and validate our findings through numerical experiments.

LGAug 4, 2022
Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Serge Kas Hanna, Rawad Bitar, Parimal Parag et al.

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive~$k$, i.e., varying $k$ throughout the runtime of the algorithm. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time that we derive. Then, we propose and implement an algorithm for adaptive distributed SGD that is based on a statistical heuristic. Our results show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations. Moreover, the results also show that the adaptive version is communication-efficient, where the amount of communication required between the master and the workers is less than that of non-adaptive versions.

LGJul 16, 2024
Self-Regulating Random Walks for Resilient Decentralized Learning on Graphs

Maximilian Egger, Rawad Bitar, Ghadir Ayache et al.

Consider the setting of multiple random walks (RWs) on a graph executing a certain computational task. For instance, in decentralized learning via RWs, a model is updated at each iteration based on the local data of the visited node and then passed to a randomly chosen neighbor. RWs can fail due to node or link failures. The goal is to maintain a desired number of RWs to ensure failure resilience. Achieving this is challenging due to the lack of a central entity to track which RWs have failed to replace them with new ones by forking (duplicating) surviving ones. Without duplications, the number of RWs will eventually go to zero, causing a catastrophic failure of the system. We propose two decentralized algorithms called DecAFork and DecAFork+ that can maintain the number of RWs in the graph around a desired value even in the presence of arbitrary RW failures. Nodes continuously estimate the number of surviving RWs by estimating their return time distribution and fork the RWs when failures are likely to happen. DecAFork+ additionally allows terminations to avoid overloading the network by forking too many RWs. We present extensive numerical simulations that show the performance of DecAFork and DecAFork+ regarding fast detection and reaction to failures compared to a baseline, and establish theoretical guarantees on the performance of both algorithms.

LGApr 14
Decentralized Learning via Random Walk with Jumps

Zonghong Liu, Matthew Dwyer, Salim El Rouayheb

We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagated across the network and updated at each visited node using local data, thereby incurring low communication and computational overheads. In weighted random-walk learning, the transition matrix is designed to achieve a desired sampling distribution, thereby speeding up convergence under data heterogeneity. We show that implementing weighted sampling via the Metropolis-Hastings algorithm can lead to a previously unexplored phenomenon we term entrapment. The random walk may become trapped in a small region of the network, resulting in highly correlated updates and severely degraded convergence. To address this issue, we propose Metropolis-Hastings with Levy jumps, which introduces occasional long-range transitions to restore exploration while respecting local information constraints. We establish a convergence rate that explicitly characterizes the roles of data heterogeneity, network spectral gap, and jump probability, and demonstrate through experiments that MHLJ effectively eliminates entrapment and significantly speeds up decentralized learning.

MAJan 12
Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks

Xingran Chen, Parimal Parag, Rohit Bhagat et al.

Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the CREATE-IF-LATE (CIL) algorithm, which is a fully decentralized, resilient mechanism that enables self-creating RWs and prevents RW extinction in the presence of Pac-Man. Our theoretical analysis shows that the CIL algorithm guarantees several desirable properties, such as (i) non-extinction of the RW population, (ii) almost sure boundedness of the RW population, and (iii) convergence of RW-based stochastic gradient descent even in the presence of Pac-Man with a quantifiable deviation from the true optimum. Moreover, the learning process experiences at most a linear time delay due to Pac-Man interruptions and RW regeneration. Our extensive empirical results on both synthetic and public benchmark datasets validate our theoretical findings.

MLAug 1, 2025
Random Walk Learning and the Pac-Man Attack

Xingran Chen, Parimal Parag, Rohit Bhagat et al.

Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the Average Crossing (AC) algorithm--a fully decentralized mechanism for duplicating RWs to prevent RW extinction in the presence of Pac-Man. Our theoretical analysis establishes that (i) the RW population remains almost surely bounded under AC and (ii) RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum. Our extensive empirical results on both synthetic and real-world datasets corroborate our theoretical findings. Furthermore, they uncover a phase transition in the extinction probability as a function of the duplication threshold. We offer theoretical insights by analyzing a simplified variant of the AC, which sheds light on the observed phase transition.

ITSep 3, 2020
Private Weighted Random Walk Stochastic Gradient Descent

Ghadir Ayache, Salim El Rouayheb

We consider a decentralized learning setting in which data is distributed over nodes in a graph. The goal is to learn a global model on the distributed data without involving any central entity that needs to be trusted. While gossip-based stochastic gradient descent (SGD) can be used to achieve this learning objective, it incurs high communication and computation costs, since it has to wait for all the local models at all the nodes to converge. To speed up the convergence, we propose instead to study random walk based SGD in which a global model is updated based on a random walk on the graph. We propose two algorithms based on two types of random walks that achieve, in a decentralized way, uniform sampling and importance sampling of the data. We provide a non-asymptotic analysis on the rate of convergence, taking into account the constants related to the data and the graph. Our numerical results show that the weighted random walk based algorithm has a better performance for high-variance data. Moreover, we propose a privacy-preserving random walk algorithm that achieves local differential privacy based on a Gamma noise mechanism that we propose. We also give numerical results on the convergence of this algorithm and show that it outperforms additive Laplace-based privacy mechanisms.

ITJul 10, 2020
Mechanisms for Hiding Sensitive Genotypes with Information-Theoretic Privacy

Fangwei Ye, Hyunghoon Cho, Salim El Rouayheb

Motivated by the growing availability of personal genomics services, we study an information-theoretic privacy problem that arises when sharing genomic data: a user wants to share his or her genome sequence while keeping the genotypes at certain positions hidden, which could otherwise reveal critical health-related information. A straightforward solution of erasing (masking) the chosen genotypes does not ensure privacy, because the correlation between nearby positions can leak the masked genotypes. We introduce an erasure-based privacy mechanism with perfect information-theoretic privacy, whereby the released sequence is statistically independent of the sensitive genotypes. Our mechanism can be interpreted as a locally-optimal greedy algorithm for a given processing order of sequence positions, where utility is measured by the number of positions released without erasure. We show that finding an optimal order is NP-hard in general and provide an upper bound on the optimal utility. For sequences from hidden Markov models, a standard modeling approach in genetics, we propose an efficient algorithmic implementation of our mechanism with complexity polynomial in sequence length. Moreover, we illustrate the robustness of the mechanism by bounding the privacy leakage from erroneous prior distributions. Our work is a step towards more rigorous control of privacy in genomic data sharing.

LGFeb 25, 2020
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Serge Kas Hanna, Rawad Bitar, Parimal Parag et al.

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive $k$. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.

LGDec 10, 2019
Advances and Open Problems in Federated Learning

Peter Kairouz, H. Brendan McMahan, Brendan Avent et al.

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

ITSep 1, 2017
Private Information Retrieval with Side Information

Swanand Kadhe, Brenden Garcia, Anoosheh Heidarzadeh et al.

We study the problem of Private Information Retrieval (PIR) in the presence of prior side information. The problem setup includes a database of $K$ independent messages possibly replicated on several servers, and a user that needs to retrieve one of these messages. In addition, the user has some prior side information in the form of a subset of $M$ messages, not containing the desired message and unknown to the servers. This problem is motivated by practical settings in which the user can obtain side information opportunistically from other users or has previously downloaded some messages using classical PIR schemes. The objective of the user is to retrieve the required message without revealing its identity while minimizing the amount of data downloaded from the servers. We focus on achieving information-theoretic privacy in two scenarios: (i) the user wants to protect jointly its demand and side information; (ii) the user wants to protect only the information about its demand, but not the side information. To highlight the role of side information, we focus first on the case of a single server (single database). In the first scenario, we prove that the minimum download cost is $K-M$ messages, and in the second scenario it is $\lceil \frac{K}{M+1}\rceil$ messages, which should be compared to $K$ messages, the minimum download cost in the case of no side information. Then, we extend some of our results to the case of the database replicated on multiple servers. Our proof techniques relate PIR with side information to the index coding problem. We leverage this connection to prove converse results, as well as to design achievability schemes.

ITFeb 8, 2012
Data Exchange Problem with Helpers

Nebojsa Milosavljevic, Sameer Pawar, Salim El Rouayheb et al.

In this paper we construct a deterministic polynomial time algorithm for the problem where a set of users is interested in gaining access to a common file, but where each has only partial knowledge of the file. We further assume the existence of another set of terminals in the system, called helpers, who are not interested in the common file, but who are willing to help the users. Given that the collective information of all the terminals is sufficient to allow recovery of the entire file, the goal is to minimize the (weighted) sum of bits that these terminals need to exchange over a noiseless public channel in order achieve this goal. Based on established connections to the multi-terminal secrecy problem, our algorithm also implies a polynomial-time method for constructing the largest shared secret key in the presence of an eavesdropper. We consider the following side-information settings: (i) side-information in the form of uncoded packets of the file, where the terminals' side-information consists of subsets of the file; (ii) side-information in the form of linearly correlated packets, where the terminals have access to linear combinations of the file packets; and (iii) the general setting where the the terminals' side-information has an arbitrary (i.i.d.) correlation structure. We provide a polynomial-time algorithm (in the number of terminals) that finds the optimal rate allocations for these terminals, and then determines an explicit optimal transmission scheme for cases (i) and (ii).