Rafael Pires

h-index10

20papers

247citations

Novelty58%

AI Score43

Ranked #56,708 of 194,257 authors (top 29%)#252 in DC (top 26%)

20 Papers

17.5LGOct 3, 2023Code

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui et al.

We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.

13.0DCApr 17, 2023Code

Decentralized Learning Made Easy with DecentralizePy

Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires et al.

Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose DecentralizePy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of DecentralizePy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes.

7.3DCJun 7, 2023Code

Get More for Less in Decentralized Learning Systems

Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires et al.

Decentralized learning (DL) systems have been gaining popularity because they avoid raw data sharing by communicating only model parameters, hence preserving data confidentiality. However, the large size of deep neural networks poses a significant challenge for decentralized training, since each node needs to exchange gigabytes of data, overloading the network. In this paper, we address this challenge with JWINS, a communication-efficient and fully decentralized learning system that shares only a subset of parameters through sparsification. JWINS uses wavelet transform to limit the information loss due to sparsification and a randomized communication cut-off that reduces communication usage without damaging the performance of trained models. We demonstrate empirically with 96 DL nodes on non-IID datasets that JWINS can achieve similar accuracies to full-sharing DL while sending up to 64% fewer bytes. Additionally, on low communication budgets, JWINS outperforms the state-of-the-art communication-efficient DL algorithm CHOCO-SGD by up to 4x in terms of network savings and time.

4.6LGJul 1, 2024

Energy-Aware Decentralized Learning with Intermittent Model Training

Akash Dhasade, Paolo Dini, Elia Guerra et al.

Decentralized learning (DL) offers a powerful framework where nodes collaboratively train models without sharing raw data and without the coordination of a central server. In the iterative rounds of DL, models are trained locally, shared with neighbors in the topology, and aggregated with other models received from neighbors. Sharing and merging models contribute to convergence towards a consensus model that generalizes better across the collective data captured at training time. In addition, the energy consumption while sharing and merging model parameters is negligible compared to the energy spent during the training phase. Leveraging this fact, we present SkipTrain, a novel DL algorithm, which minimizes energy consumption in decentralized learning by strategically skipping some training rounds and substituting them with synchronization rounds. These training-silent periods, besides saving energy, also allow models to better mix and finally produce models with superior accuracy than typical DL algorithms that train at every round. Our empirical evaluations with 256 nodes demonstrate that SkipTrain reduces energy consumption by 50% and increases model accuracy by up to 12% compared to D-PSGD, the conventional DL algorithm.

14.2LGNov 11, 2024Code

Revisiting Ensembling in One-Shot Federated Learning

Youssef Allouah, Akash Dhasade, Rachid Guerraoui et al.

Federated learning (FL) is an appealing approach to training machine learning models without sharing raw data. However, standard FL algorithms are iterative and thus induce a significant communication cost. One-shot federated learning (OFL) trades the iterative exchange of models between clients and the server with a single round of communication, thereby saving substantially on communication costs. Not surprisingly, OFL exhibits a performance gap in terms of accuracy with respect to FL, especially under high data heterogeneity. We introduce FENS, a novel federated ensembling scheme that approaches the accuracy of FL with the communication efficiency of OFL. Learning in FENS proceeds in two phases: first, clients train models locally and send them to the server, similar to OFL; second, clients collaboratively train a lightweight prediction aggregator model using FL. We showcase the effectiveness of FENS through exhaustive experiments spanning several datasets and heterogeneity levels. In the particular case of heterogeneously distributed CIFAR-10 dataset, FENS achieves up to a 26.9% higher accuracy over state-of-the-art (SOTA) OFL, being only 3.1% lower than FL. At the same time, FENS incurs at most 4.3x more communication than OFL, whereas FL is at least 10.9x more communication-intensive than FENS.

6.6DCApr 15, 2024Code

Noiseless Privacy-Preserving Decentralized Learning

Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec et al.

Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training data. Conventional defenses such as differential privacy and secure aggregation fall short in effectively safeguarding user privacy in DL, either sacrificing model utility or efficiency. We introduce Shatter, a novel DL approach in which nodes create virtual nodes (VNs) to disseminate chunks of their full model on their behalf. This enhances privacy by (i) preventing attackers from collecting full models from other nodes, and (ii) hiding the identity of the original node that produced a given model chunk. We theoretically prove the convergence of Shatter and provide a formal analysis demonstrating how Shatter reduces the efficacy of attacks compared to when exchanging full models between nodes. We evaluate the convergence and attack resilience of Shatter with existing DL algorithms, with heterogeneous datasets, and against three standard privacy attacks. Our evaluation shows that Shatter not only renders these privacy attacks infeasible when each node operates 16 VNs but also exhibits a positive impact on model utility compared to standard DL. In summary, Shatter enhances the privacy of DL while maintaining the utility and efficiency of the model.

5.1DCOct 16, 2024

Boosting Asynchronous Decentralized Learning with Model Fragmentation

Sayan Biswas, Anne-Marie Kermarrec, Alexis Marouani et al.

Decentralized learning (DL) is an emerging technique that allows nodes on the web to collaboratively train machine learning models without sharing raw data. Dealing with stragglers, i.e., nodes with slower compute or communication than others, is a key challenge in DL. We present DivShare, a novel asynchronous DL algorithm that achieves fast model convergence in the presence of communication stragglers. DivShare achieves this by having nodes fragment their models into parameter subsets and send, in parallel to computation, each subset to a random sample of other nodes instead of sequentially exchanging full models. The transfer of smaller fragments allows more efficient usage of the collective bandwidth and enables nodes with slow network links to quickly contribute with at least some of their model parameters. By theoretically proving the convergence of DivShare, we provide, to the best of our knowledge, the first formal proof of convergence for a DL algorithm that accounts for the effects of asynchronous communication with delays. We experimentally evaluate DivShare against two state-of-the-art DL baselines, AD-PSGD and Swift, and with two standard datasets, CIFAR-10 and MovieLens. We find that DivShare with communication stragglers lowers time-to-accuracy by up to 3.9x compared to AD-PSGD on the CIFAR-10 dataset. Compared to baselines, DivShare also achieves up to 19.4% better accuracy and 9.5% lower test loss on the CIFAR-10 and MovieLens datasets, respectively.

7.9LGMar 18, 2024

Low-Cost Privacy-Preserving Decentralized Learning

Sayan Biswas, Davide Frey, Romaric Gaudel et al.

Decentralized learning (DL) is an emerging paradigm of collaborative machine learning that enables nodes in a network to train models collectively without sharing their raw data or relying on a central server. This paper introduces Zip-DL, a privacy-aware DL algorithm that leverages correlated noise to achieve robust privacy against local adversaries while ensuring efficient convergence at low communication costs. By progressively neutralizing the noise added during distributed averaging, Zip-DL combines strong privacy guarantees with high model accuracy. Its design requires only one communication round per gradient descent iteration, significantly reducing communication overhead compared to competitors. We establish theoretical bounds on both convergence speed and privacy guarantees. Moreover, extensive experiments demonstrating Zip-DL's practical applicability make it outperform state-of-the-art methods in the accuracy vs. vulnerability trade-off. Specifically, Zip-DL (i) reduces membership-inference attack success rates by up to 35% compared to baseline DL, (ii) decreases attack efficacy by up to 13% compared to competitors offering similar utility, and (iii) achieves up to 59% higher accuracy to completely nullify a basic attack scenario, compared to a state-of-the-art privacy-preserving approach under the same threat model. These results position Zip-DL as a practical and efficient solution for privacy-preserving decentralized learning in real-world applications.

5.9DBMar 7, 2025

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation

Shai Bergman, Anne-Marie Kermarrec, Diana Petrescu et al.

Retrieval-augmented generation (RAG) improves the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-to-end inference time since looking for relevant documents from large vector databases is computationally expensive. To address this, we introduce Proximity, an approximate key-value cache that optimizes the RAG workflow by leveraging similarities in user queries. Instead of treating each query independently, Proximity reuses previously retrieved documents when similar queries appear, substantially reducing the reliance on expensive vector database lookups. To efficiently scale, Proximity employs a locality-sensitive hashing (LSH) scheme that enables fast cache lookups while preserving retrieval accuracy. We evaluate Proximity using the MMLU and MedRAG question-answering benchmarks. Our experiments demonstrate that Proximity with our LSH scheme and a realistically-skewed MedRAG workload reduces database calls by 77.2% while maintaining database recall and test accuracy. We experiment with different similarity tolerances and cache capacities, and show that the time spent within the Proximity cache remains low and constant (4.8 microseconds) even as the cache grows substantially in size. Our results demonstrate that approximate caching is a practical and effective strategy for optimizing RAG-based systems.

6.4LGMay 13, 2024

Secure Aggregation Meets Sparsification in Decentralized Learning

Sayan Biswas, Anne-Marie Kermarrec, Rafael Pires et al.

Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance communication efficiency in DL, sparsification techniques are used, selectively sharing only the most crucial parameters or gradients in a model, thereby maintaining efficiency without notably compromising accuracy. However, applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes, which can prevent masks from canceling out effectively. This paper introduces CESAR, a novel secure aggregation protocol for DL designed to be compatible with existing sparsification mechanisms. CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them. We provide a foundational understanding of the interaction between the sparsification carried out by the nodes and the proportion of the parameters shared under CESAR in both colluding and non-colluding environments, offering analytical insight into the working and applicability of the protocol. Experiments on a network with 48 nodes in a 3-regular topology show that with random subsampling, CESAR is always within 0.5% accuracy of decentralized parallel stochastic gradient descent (D-PSGD), while adding only 11% of data overhead. Moreover, it surpasses the accuracy on TopK by up to 0.3% on independent and identically distributed (IID) data.

1.2DCSep 2, 2025

Efficient Pyramidal Analysis of Gigapixel Images on a Decentralized Modest Computer Cluster

Marie Reinbigler, Rishi Sharma, Rafael Pires et al.

Analyzing gigapixel images is recognized as computationally demanding. In this paper, we introduce PyramidAI, a technique for analyzing gigapixel images with reduced computational cost. The proposed approach adopts a gradual analysis of the image, beginning with lower resolutions and progressively concentrating on regions of interest for detailed examination at higher resolutions. We investigated two strategies for tuning the accuracy-computation performance trade-off when implementing the adaptive resolution selection, validated against the Camelyon16 dataset of biomedical images. Our results demonstrate that PyramidAI substantially decreases the amount of processed data required for analysis by up to 2.65x, while preserving the accuracy in identifying relevant sections on a single computer. To ensure democratization of gigapixel image analysis, we evaluated the potential to use mainstream computers to perform the computation by exploiting the parallelism potential of the approach. Using a simulator, we estimated the best data distribution and load balancing algorithm according to the number of workers. The selected algorithms were implemented and highlighted the same conclusions in a real-world setting. Analysis time is reduced from more than an hour to a few minutes using 12 modest workers, offering a practical solution for efficient large-scale image analysis.

2.6LGMay 24, 2024

Harnessing Increased Client Participation with Cohort-Parallel Federated Learning

Akash Dhasade, Anne-Marie Kermarrec, Tuan-Anh Nguyen et al.

Federated learning (FL) is a machine learning approach where nodes collaboratively train a global model. As more nodes participate in a round of FL, the effectiveness of individual model updates by nodes also diminishes. In this study, we increase the effectiveness of client updates by dividing the network into smaller partitions, or cohorts. We introduce Cohort-Parallel Federated Learning (CPFL): a novel learning approach where each cohort independently trains a global model using FL, until convergence, and the produced models by each cohort are then unified using knowledge distillation. The insight behind CPFL is that smaller, isolated networks converge quicker than in a one-network setting where all nodes participate. Through exhaustive experiments involving realistic traces and non-IID data distributions on the CIFAR-10 and FEMNIST image classification tasks, we investigate the balance between the number of cohorts, model accuracy, training time, and compute resources. Compared to traditional FL, CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9x reduction in train time and a 1.3x reduction in resource usage, with a minimal drop in test accuracy.

4.3DCFeb 23, 2022Code

TEE-based decentralized recommender systems: The raw data sharing redemption

Akash Dhasade, Nevena Dresevic, Anne-Marie Kermarrec et al.

Recommenders are central in many applications today. The most effective recommendation schemes, such as those based on collaborative filtering (CF), exploit similarities between user profiles to make recommendations, but potentially expose private data. Federated learning and decentralized learning systems address this by letting the data stay on user's machines to preserve privacy: each user performs the training on local data and only the model parameters are shared. However, sharing the model parameters across the network may still yield privacy breaches. In this paper, we present REX, the first enclave-based decentralized CF recommender. REX exploits Trusted execution environments (TEE), such as Intel software guard extensions (SGX), that provide shielded environments within the processor to improve convergence while preserving privacy. Firstly, REX enables raw data sharing, which ultimately speeds up convergence and reduces the network load. Secondly, REX fully preserves privacy. We analyze the impact of raw data sharing in both deep neural network (DNN) and matrix factorization (MF) recommenders and showcase the benefits of trusted environments in a full-fledged implementation of REX. Our experimental results demonstrate that through raw data sharing, REX significantly decreases the training time by 18.3x and the network load by 2 orders of magnitude over standard decentralized approaches that share only parameters, while fully protecting privacy by leveraging trustworthy hardware enclaves with very little overhead.

3.1LGOct 21, 2021

Boosting Resource-Constrained Federated Learning Systems with Guessed Updates

Mohamed Yassine Boukhari, Akash Dhasade, Anne-Marie Kermarrec et al.

Federated learning (FL) enables a set of client devices to collaboratively train a model without sharing raw data. This process, though, operates under the constrained computation and communication resources of edge devices. These constraints combined with systems heterogeneity force some participating clients to perform fewer local updates than expected by the server, thus slowing down convergence. Exhaustive tuning of hyperparameters in FL, furthermore, can be resource-intensive, without which the convergence is adversely affected. In this work, we propose GEL, the guess and learn algorithm. GEL enables constrained edge devices to perform additional learning through guessed updates on top of gradient-based steps. These guesses are gradientless, i.e., participating clients leverage them for free. Our generic guessing algorithm (i) can be flexibly combined with several state-of-the-art algorithms including FEDPROX, FEDNOVA, FEDYOGI or SCALEFL; and (ii) achieves significantly improved performance when the learning rates are not best tuned. We conduct extensive experiments and show that GEL can boost empirical convergence by up to 40% in resource constrained networks while relieving the need for exhaustive learning rate tuning.

16.2CRMar 31, 2020

Trust Management as a Service: Enabling Trusted Execution in the Face of Byzantine Stakeholders

Franz Gregor, Wojciech Ozga, Sébastien Vaucher et al.

Trust is arguably the most important challenge for critical services both deployed as well as accessed remotely over the network. These systems are exposed to a wide diversity of threats, ranging from bugs to exploits, active attacks, rogue operators, or simply careless administrators. To protect such applications, one needs to guarantee that they are properly configured and securely provisioned with the "secrets" (e.g., encryption keys) necessary to preserve not only the confidentiality, integrity and freshness of their data but also their code. Furthermore, these secrets should not be kept under the control of a single stakeholder - which might be compromised and would represent a single point of failure - and they must be protected across software versions in the sense that attackers cannot get access to them via malicious updates. Traditional approaches for solving these challenges often use ad hoc techniques and ultimately rely on a hardware security module (HSM) as root of trust. We propose a more powerful and generic approach to trust management that instead relies on trusted execution environments (TEEs) and a set of stakeholders as root of trust. Our system, PALAEMON, can operate as a managed service deployed in an untrusted environment, i.e., one can delegate its operations to an untrusted cloud provider with the guarantee that data will remain confidential despite not trusting any individual human (even with root access) nor system software. PALAEMON addresses in a secure, efficient and cost-effective way five main challenges faced when developing trusted networked applications and services. Our evaluation on a range of benchmarks and real applications shows that PALAEMON performs efficiently and can protect secrets of services without any change to their source code.

2.9CRJan 27, 2020

Distributed systems and trusted execution environments: Trade-offs and challenges

Rafael Pereira Pires

Security and privacy concerns in computer systems have grown in importance with the ubiquity of connected devices. TEEs provide security guarantees based on cryptographic constructs built in hardware. Intel software guard extensions (SGX), in particular, implements powerful mechanisms that can shield sensitive data even from privileged users with full control of system software. In this work, we essentially explore some of the challenges of designing secure distributed systems by using Intel SGX as cornerstone. We do so by designing and experimentally evaluating several elementary systems ranging from communication and processing middleware to a peer-to-peer privacy-preserving solution. We start with support systems that naturally fit cloud deployment scenarios, namely content-based routing, batching and stream processing frameworks. We implement prototypes and use them to analyse the manifested memory usage issues intrinsic to SGX. Next, we aim at protecting very sensitive data: cryptographic keys. By leveraging TEEs, we design protocols for group data sharing that have lower computational complexity than legacy methods. As a bonus, our proposals allow large savings on metadata volume and processing time of cryptographic operations, all with equivalent security guarantees. Finally, we propose privacy-preserving systems against established services like web-search engines. Our evaluation shows that we propose the most robust system in comparison to existing solutions with regard to user re-identification rates and results accuracy in a scalable way. Overall, this thesis proposes new mechanisms that take advantage of TEEs for distributed system architectures. We show through an empirical approach on top of Intel SGX what are the trade-offs of distinct designs applied to distributed communication and processing, cryptographic protocols and private web search.

9.7CRJul 11, 2019

Malware in the SGX supply chain: Be careful when signing enclaves!

Vlad Crăciun, Pascal Felber, Andrei Mogage et al.

Malware attacks are a significant part of the new software security threats detected each year. Intel Software Guard Extensions (SGX) are a set of hardware instructions introduced by Intel in their recent lines of processors that are intended to provide a secure execution environment for user-developed applications. To our knowledge, there was no serious attempt yet to overcome the SGX protection by exploiting the weaknesses in the software supply chain infrastructure, namely at the level of the development, build or signing servers. While SGX protection does not specifically take into consideration such threats, we show in the current paper that a simple malware attack exploiting a separation between the build and signing processes can have a serious damaging impact, practically nullifying SGX integrity protection measures. We also explore two possible mitigations against the attack, one centralized leveraging SGX itself, and one distributed that relies on a smart contract deployed on a blockchain infrastructure. Our evaluation shows that both methods are feasible in practice and their added costs are acceptable for the offered protection.

7.3DCMay 4, 2018Code

SecureStreams: A Reactive Middleware Framework for Secure Data Stream Processing

Aurélien Havet, Rafael Pires, Pascal Felber et al.

The growing adoption of distributed data processing frameworks in a wide diversity of application domains challenges end-to-end integration of properties like security, in particular when considering deployments in the context of large-scale clusters or multi-tenant Cloud infrastructures. This paper therefore introduces SecureStreams, a reactive middleware framework to deploy and process secure streams at scale. Its design combines the high-level reactive dataflow programming paradigm with Intel's low-level software guard extensions (SGX) in order to guarantee privacy and integrity of the processed data. The experimental results of SecureStreams are promising: while offering a fluent scripting language based on Lua, our middleware delivers high processing throughput, thus enabling developers to implement secure processing pipelines in just few lines of code.

7.3DCMay 3, 2018

CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions

Rafael Pires, David Goltzsche, Sonia Ben Mokhtar et al.

By regularly querying Web search engines, users (unconsciously) disclose large amounts of their personal data as part of their search queries, among which some might reveal sensitive information (e.g. health issues, sexual, political or religious preferences). Several solutions exist to allow users querying search engines while improving privacy protection. However, these solutions suffer from a number of limitations: some are subject to user re-identification attacks, while others lack scalability or are unable to provide accurate results. This paper presents CYCLOSA, a secure, scalable and accurate private Web search solution. CYCLOSA improves security by relying on trusted execution environments (TEEs) as provided by Intel SGX. Further, CYCLOSA proposes a novel adaptive privacy protection solution that reduces the risk of user re- identification. CYCLOSA sends fake queries to the search engine and dynamically adapts their count according to the sensitivity of the user query. In addition, CYCLOSA meets scalability as it is fully decentralized, spreading the load for distributing fake queries among other nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real query and the fake queries separately, in contrast to other existing solutions that mix fake and real query results.

3.3DCMay 16, 2017Code

A lightweight MapReduce framework for secure processing with SGX

Rafael Pires, Daniel Gavril, Pascal Felber et al.

MapReduce is a programming model used extensively for parallel data processing in distributed environments. A wide range of algorithms were implemented using MapReduce, from simple tasks like sorting and searching up to complex clustering and machine learning operations. Many of these implementations are part of services externalized to cloud infrastructures. Over the past years, however, many concerns have been raised regarding the security guarantees offered in such environments. Some solutions relying on cryptography were proposed for countering threats but these typically imply a high computational overhead. Intel, the largest manufacturer of commodity CPUs, recently introduced SGX (software guard extensions), a set of hardware instructions that support execution of code in an isolated secure environment. In this paper, we explore the use of Intel SGX for providing privacy guarantees for MapReduce operations, and based on our evaluation we conclude that it represents a viable alternative to a cryptographic mechanism. We present results based on the widely used k-means clustering algorithm, but our implementation can be generalized to other applications that can be expressed using MapReduce model.