Christopher G. Brinton

h-index26

78papers

1,855citations

Novelty51%

AI Score52

Ranked #15,579 of 194,257 authors (top 8%)#3,935 in LG (top 10%)

78 Papers

24.8LGAug 21, 2023

Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges

Vishnu Pandi Chellapandi, Liangqi Yuan, Christopher G. Brinton et al.

Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future investigation to further enhance the effectiveness and efficiency of FL in the context of CAV are discussed.

14.1LGMar 18, 2022

Latency Optimization for Blockchain-Empowered Federated Learning in Multi-Server Edge Computing

Dinh C. Nguyen, Seyyedali Hosseinalipour, David J. Love et al.

In this paper, we study a new latency optimization problem for blockchain-based federated learning (BFL) in multi-server edge computing. In this system model, distributed mobile devices (MDs) communicate with a set of edge servers (ESs) to handle both machine learning (ML) model training and block mining simultaneously. To assist the ML model training for resource-constrained MDs, we develop an offloading strategy that enables MDs to transmit their data to one of the associated ESs. We then propose a new decentralized ML model aggregation solution at the edge layer based on a consensus mechanism to build a global ML model via peer-to-peer (P2P)-based blockchain communications. Blockchain builds trust among MDs and ESs to facilitate reliable ML model sharing and cooperative consensus formation, and enables rapid elimination of manipulated models caused by poisoning attacks. We formulate latency-aware BFL as an optimization aiming to minimize the system latency via joint consideration of the data offloading decisions, MDs' transmit power, channel bandwidth allocation for MDs' data offloading, MDs' computational allocation, and hash power allocation. Given the mixed action space of discrete offloading and continuous allocation variables, we propose a novel deep reinforcement learning scheme with a parameterized advantage actor critic algorithm. We theoretically characterize the convergence properties of BFL in terms of the aggregation delay, mini-batch size, and number of P2P communication rounds. Our numerical evaluation demonstrates the superiority of our proposed scheme over baselines in terms of model training efficiency, convergence rate, system latency, and robustness against model poisoning attacks.

14.6LGMar 26, 2022

Multi-Edge Server-Assisted Dynamic Federated Learning with an Optimized Floating Aggregation Point

Bhargav Ganguly, Seyyedali Hosseinalipour, Kwang Taik Kim et al.

We propose cooperative edge-assisted dynamic federated learning (CE-FL). CE-FL introduces a distributed machine learning (ML) architecture, where data collection is carried out at the end devices, while the model training is conducted cooperatively at the end devices and the edge servers, enabled via data offloading from the end devices to the edge servers through base stations. CE-FL also introduces floating aggregation point, where the local models generated at the devices and the servers are aggregated at an edge server, which varies from one model training round to another to cope with the network evolution in terms of data distribution and users' mobility. CE-FL considers the heterogeneity of network elements in terms of communication/computation models and the proximity to one another. CE-FL further presumes a dynamic environment with online variation of data at the network devices which causes a drift at the ML model performance. We model the processes taken during CE-FL, and conduct analytical convergence analysis of its ML model training. We then formulate network-aware CE-FL which aims to adaptively optimize all the network elements via tuning their contribution to the learning process, which turns out to be a non-convex mixed integer problem. Motivated by the large scale of the system, we propose a distributed optimization solver to break down the computation of the solution across the network elements. We finally demonstrate the effectiveness of our framework with the data collected from a real-world testbed.

6.6DCMar 15, 2023

Towards Cooperative Federated Learning over Heterogeneous Edge/Fog Networks

Su Wang, Seyyedali Hosseinalipour, Vaneet Aggarwal et al.

Federated learning (FL) has been promoted as a popular technique for training machine learning (ML) models over edge/fog networks. Traditional implementations of FL have largely neglected the potential for inter-network cooperation, treating edge/fog devices and other infrastructure participating in ML as separate processing elements. Consequently, FL has been vulnerable to several dimensions of network heterogeneity, such as varying computation capabilities, communication resources, data qualities, and privacy demands. We advocate for cooperative federated learning (CFL), a cooperative edge/fog ML paradigm built on device-to-device (D2D) and device-to-server (D2S) interactions. Through D2D and D2S cooperation, CFL counteracts network heterogeneity in edge/fog networks through enabling a model/data/resource pooling mechanism, which will yield substantial improvements in ML model training quality and network resource consumption. We propose a set of core methodologies that form the foundation of D2D and D2S cooperation and present preliminary experiments that demonstrate their benefits. We also discuss new FL functionalities enabled by this cooperative framework such as the integration of unlabeled data and heterogeneous device privacy into ML model training. Finally, we describe some open research directions at the intersection of cooperative edge/fog and FL.

2.3DCApr 20, 2023

A Comparative Analysis of Distributed Linear Solvers under Data Heterogeneity

Boris Velasevic, Rohit Parasnis, Christopher G. Brinton et al. · mit

We consider the problem of solving a large-scale system of linear equations in a distributed or federated manner by a taskmaster and a set of machines, each possessing a subset of the equations. We provide a comprehensive comparison of two well-known classes of algorithms used to solve this problem: projection-based methods and optimization-based methods. First, we introduce a novel geometric notion of data heterogeneity called angular heterogeneity and discuss its generality. Using this notion, we characterize the optimal convergence rates of the most prominent algorithms from each class, capturing the effects of the number of machines, the number of equations, and that of both cross-machine and local data heterogeneity on these rates. Our analysis establishes the superiority of Accelerated Projected Consensus in realistic scenarios with significant data heterogeneity and offers several insights into how angular heterogeneity affects the efficiency of the methods studied. Additionally, we develop distributed algorithms for the efficient computation of the proposed angular heterogeneity metrics. Our extensive numerical analyses validate and complement our theoretical results.

15.7LGSep 27, 2024Code

Hierarchical Federated Learning with Multi-Timescale Gradient Correction

Wenzhi Fang, Dong-Jun Han, Evan Chen et al.

While traditional federated learning (FL) typically focuses on a star topology where clients are directly connected to a central server, real-world distributed systems often exhibit hierarchical architectures. Hierarchical FL (HFL) has emerged as a promising solution to bridge this gap, leveraging aggregation points at multiple levels of the system. However, existing algorithms for HFL encounter challenges in dealing with multi-timescale model drift, i.e., model drift occurring across hierarchical levels of data heterogeneity. In this paper, we propose a multi-timescale gradient correction (MTGC) methodology to resolve this issue. Our key idea is to introduce distinct control variables to (i) correct the client gradient towards the group gradient, i.e., to reduce client model drift caused by local updates based on individual datasets, and (ii) correct the group gradient towards the global gradient, i.e., to reduce group model drift caused by FL over clients within the group. We analytically characterize the convergence behavior of MTGC under general non-convex settings, overcoming challenges associated with couplings between correction terms. We show that our convergence bound is immune to the extent of data heterogeneity, confirming the stability of the proposed algorithm against multi-level non-i.i.d. data. Through extensive experiments on various datasets and models, we validate the effectiveness of MTGC in diverse HFL settings. The code for this project is available at \href{https://github.com/wenzhifang/MTGC}{https://github.com/wenzhifang/MTGC}.

7.8LGAug 4, 2022

Embedding Alignment for Unsupervised Federated Learning via Smart Data Exchange

Satyavrat Wagle, Seyyedali Hosseinalipour, Naji Khosravan et al.

Federated learning (FL) has been recognized as one of the most promising solutions for distributed machine learning (ML). In most of the current literature, FL has been studied for supervised ML tasks, in which edge devices collect labeled data. Nevertheless, in many applications, it is impractical to assume existence of labeled data across devices. To this end, we develop a novel methodology, Cooperative Federated unsupervised Contrastive Learning (CF-CL), for FL across edge devices with unlabeled datasets. CF-CL employs local device cooperation where data are exchanged among devices through device-to-device (D2D) communications to avoid local model bias resulting from non-independent and identically distributed (non-i.i.d.) local datasets. CF-CL introduces a push-pull smart data sharing mechanism tailored to unsupervised FL settings, in which, each device pushes a subset of its local datapoints to its neighbors as reserved data points, and pulls a set of datapoints from its neighbors, sampled through a probabilistic importance sampling technique. We demonstrate that CF-CL leads to (i) alignment of unsupervised learned latent spaces across devices, (ii) faster global convergence, allowing for less frequent global model aggregations; and (iii) is effective in extreme non-i.i.d. data settings across the devices.

4.3ITApr 25, 2023

Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning

Junghoon Kim, Taejoon Kim, David Love et al.

The design of codes for feedback-enabled communications has been a long-standing open problem. Recent research on non-linear, deep learning-based coding schemes have demonstrated significant improvements in communication reliability over linear codes, but are still vulnerable to the presence of forward and feedback noise over the channel. In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise. Our autoencoder-based architecture is designed to learn codes based on consecutive blocks of bits, which obtains de-noising advantages over bit-by-bit processing to help overcome the physical separation between the encoder and decoder over a noisy channel. Moreover, we develop a power control layer at the encoder to explicitly incorporate hardware constraints into the learning optimization, and prove that the resulting average power constraint is satisfied asymptotically. Numerical experiments demonstrate that our scheme outperforms state-of-the-art feedback codes by wide margins over practical forward and feedback noise regimes, and provide information-theoretic insights on the behavior of our non-linear codes. Moreover, we observe that, in a long blocklength regime, canonical error correction codes are still preferable to feedback codes when the feedback noise becomes high.

18.5LGDec 16, 2022

SplitGP: Achieving Both Generalization and Personalization in Federated Learning

Dong-Jun Han, Do-Yeon Kim, Minseok Choi et al.

A fundamental challenge to providing edge-AI services is the need for a machine learning (ML) model that achieves personalization (i.e., to individual clients) and generalization (i.e., to unseen data) properties concurrently. Existing techniques in federated learning (FL) have encountered a steep tradeoff between these objectives and impose large computational requirements on edge devices during training and inference. In this paper, we propose SplitGP, a new split learning solution that can simultaneously capture generalization and personalization capabilities for efficient inference across resource-constrained clients (e.g., mobile/IoT devices). Our key idea is to split the full ML model into client-side and server-side components, and impose different roles to them: the client-side model is trained to have strong personalization capability optimized to each client's main task, while the server-side model is trained to have strong generalization capability for handling all clients' out-of-distribution tasks. We analytically characterize the convergence behavior of SplitGP, revealing that all client models approach stationary points asymptotically. Further, we analyze the inference time in SplitGP and provide bounds for determining model split ratios. Experimental results show that SplitGP outperforms existing baselines by wide margins in inference time and test accuracy for varying amounts of out-of-distribution samples.

11.8LGSep 21, 2022

Performance Optimization for Variable Bitwidth Federated Learning in Wireless Networks

Sihua Wang, Mingzhe Chen, Christopher G. Brinton et al.

This paper considers improving wireless communication and computation efficiency in federated learning (FL) via model quantization. In the proposed bitwidth FL scheme, edge devices train and transmit quantized versions of their local FL model parameters to a coordinating server, which aggregates them into a quantized global model and synchronizes the devices. The goal is to jointly determine the bitwidths employed for local FL model quantization and the set of devices participating in FL training at each iteration. We pose this as an optimization problem that aims to minimize the training loss of quantized FL under a per-iteration device sampling budget and delay requirement. However, the formulated problem is difficult to solve without (i) a concrete understanding of how quantization impacts global ML performance and (ii) the ability of the server to construct estimates of this process efficiently. To address the first challenge, we analytically characterize how limited wireless resources and induced quantization errors affect the performance of the proposed FL method. Our results quantify how the improvement of FL training loss between two consecutive iterations depends on the device selection and quantization scheme as well as on several parameters inherent to the model being learned. Then, we show that the FL training process can be described as a Markov decision process and propose a model-based reinforcement learning (RL) method to optimize action selection over iterations. Compared to model-free RL, this model-based RL approach leverages the derived mathematical characterization of the FL training process to discover an effective device selection and quantization scheme without imposing additional device communication overhead. Simulation results show that the proposed FL algorithm can reduce the convergence time.

13.6LGAug 2, 2022

Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning

Yun-Wei Chu, Seyyedali Hosseinalipour, Elizabeth Tenorio et al.

Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. In this paper, we propose a methodology for predicting student performance from their online learning activities that optimizes inference accuracy over different demographic groups such as race and gender. Building upon recent foundations in federated learning, in our approach, personalized models for individual student subgroups are derived from a global model aggregated across all student models via meta-gradient updates that account for subgroup heterogeneity. To learn better representations of student activity, we augment our approach with a self-supervised behavioral pretraining methodology that leverages multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums), and include a neural network attention mechanism in the model aggregation stage. Through experiments on three real-world datasets from online courses, we demonstrate that our approach obtains substantial improvements over existing student modeling baselines in predicting student learning outcomes for all subgroups. Visual analysis of the resulting student embeddings confirm that our personalization methodology indeed identifies different activity patterns within different subgroups, consistent with its stronger inference ability compared with the baselines.

2.3ITFeb 23, 2023

Coded Matrix Computations for D2D-enabled Linearized Federated Learning

Anindya Bijoy Das, Aditya Ramamoorthy, David J. Love et al.

Federated learning (FL) is a popular technique for training a global model on data distributed across client devices. Like other distributed training techniques, FL is susceptible to straggler (slower or failed) clients. Recent work has proposed to address this through device-to-device (D2D) offloading, which introduces privacy concerns. In this paper, we propose a novel straggler-optimal approach for coded matrix computations which can significantly reduce the communication delay and privacy issues introduced from D2D data transmissions in FL. Moreover, our proposed approach leads to a considerable improvement of the local computation speed when the generated data matrix is sparse. Numerical evaluations confirm the superiority of our proposed method over baseline approaches.

11.8LGApr 7, 2022Code

Decentralized Event-Triggered Federated Learning with Heterogeneous Communication Thresholds

Shahryar Zehtabi, Seyyedali Hosseinalipour, Christopher G. Brinton

A recent emphasis of distributed learning research has been on federated learning (FL), in which model training is conducted by the data-collecting devices. Existing research on FL has mostly focused on a star topology learning architecture with synchronized (time-triggered) model training rounds, where the local models of the devices are periodically aggregated by a centralized coordinating node. However, in many settings, such a coordinating node may not exist, motivating efforts to fully decentralize FL. In this work, we propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over the network graph topology. We consider heterogeneous communication event thresholds at each device that weigh the change in local model parameters against the available local resources in deciding the benefit of aggregations at each iteration. Through theoretical analysis, we demonstrate that our methodology achieves asymptotic convergence to the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature, and without restrictive connectivity requirements on the underlying topology. Subsequent numerical results demonstrate that our methodology obtains substantial improvements in communication requirements compared with FL baselines.

8.7LGDec 5, 2022

Multi-Layer Personalized Federated Learning for Mitigating Biases in Student Predictive Analytics

Yun-Wei Chu, Seyyedali Hosseinalipour, Elizabeth Tenorio et al.

Conventional methods for student modeling, which involve predicting grades based on measured activities, struggle to provide accurate results for minority/underrepresented student groups due to data availability biases. In this paper, we propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology that optimizes inference accuracy over different layers of student grouping criteria, such as by course and by demographic subgroups within each course. In our approach, personalized models for individual student subgroups are derived from a global model, which is trained in a distributed fashion via meta-gradient updates that account for subgroup heterogeneity while preserving modeling commonalities that exist across the full dataset. The evaluation of the proposed methodology considers case studies of two popular downstream student modeling tasks, knowledge tracing and outcome prediction, which leverage multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums) in model training. Experiments on three real-world online course datasets show significant improvements achieved by our approach over existing student modeling benchmarks, as evidenced by an increased average prediction quality and decreased variance across different student subgroups. Visual analysis of the resulting students' knowledge state embeddings confirm that our personalization methodology extracts activity patterns clustered into different student subgroups, consistent with the performance enhancements we obtain over the baselines.

4.3NINov 7, 2023

Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks

Su Wang, Roberto Morabito, Seyyedali Hosseinalipour et al.

The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption.

12.3LGMar 22, 2023

Delay-Aware Hierarchical Federated Learning

Frank Po-Chen Lin, Seyyedali Hosseinalipour, Nicolò Michelusi et al.

Federated learning has gained popularity as a means of training models distributed across the wireless edge. The paper introduces delay-aware hierarchical federated learning (DFL) to improve the efficiency of distributed machine learning (ML) model training by accounting for communication delays between edge and cloud. Different from traditional federated learning, DFL leverages multiple stochastic gradient descent iterations on local datasets within each global aggregation period and intermittently aggregates model parameters through edge servers in local subnetworks. During global synchronization, the cloud server consolidates local models with the outdated global model using a local-global combiner, thus preserving crucial elements of both, enhancing learning efficiency under the presence of delay. A set of conditions is obtained to achieve the sub-linear convergence rate of O(1/k) for strongly convex and smooth loss functions. Based on these findings, an adaptive control algorithm is developed for DFL, implementing policies to mitigate energy consumption and communication latency while aiming for sublinear convergence. Numerical evaluations show DFL's superior performance in terms of faster global model convergence, reduced resource consumption, and robustness against communication delays compared to existing FL algorithms. In summary, this proposed method offers improved efficiency and results when dealing with both convex and non-convex loss functions.

3.3ITFeb 4, 2023

Digital Over-the-Air Federated Learning in Multi-Antenna Systems

Sihua Wang, Mingzhe Chen, Cong Shen et al.

In this paper, the performance optimization of federated learning (FL), when deployed over a realistic wireless multiple-input multiple-output (MIMO) communication system with digital modulation and over-the-air computation (AirComp) is studied. In particular, a MIMO system is considered in which edge devices transmit their local FL models (trained using their locally collected data) to a parameter server (PS) using beamforming to maximize the number of devices scheduled for transmission. The PS, acting as a central controller, generates a global FL model using the received local FL models and broadcasts it back to all devices. Due to the limited bandwidth in a wireless network, AirComp is adopted to enable efficient wireless data aggregation. However, fading of wireless channels can produce aggregate distortions in an AirComp-based FL scheme. To tackle this challenge, we propose a modified federated averaging (FedAvg) algorithm that combines digital modulation with AirComp to mitigate wireless fading while ensuring the communication efficiency. This is achieved by a joint transmit and receive beamforming design, which is formulated as an optimization problem to dynamically adjust the beamforming matrices based on current FL model parameters so as to minimize the transmitting error and ensure the FL performance. To achieve this goal, we first analytically characterize how the beamforming matrices affect the performance of the FedAvg in different iterations. Based on this relationship, an artificial neural network (ANN) is used to estimate the local FL models of all devices and adjust the beamforming matrices at the PS for future model transmission. The algorithmic advantages and improved performance of the proposed methodologies are demonstrated through extensive numerical experiments.

6.6DCApr 24, 2023

Multi-Source to Multi-Target Decentralized Federated Domain Adaptation

Su Wang, Seyyedali Hosseinalipour, Christopher G. Brinton

Heterogeneity across devices in federated learning (FL) typically refers to statistical (e.g., non-i.i.d. data distributions) and resource (e.g., communication bandwidth) dimensions. In this paper, we focus on another important dimension that has received less attention: varying quantities/distributions of labeled and unlabeled data across devices. In order to leverage all data, we develop a decentralized federated domain adaptation methodology which considers the transfer of ML models from devices with high quality labeled data (called sources) to devices with low quality or unlabeled data (called targets). Our methodology, Source-Target Determination and Link Formation (ST-LF), optimizes both (i) classification of devices into sources and targets and (ii) source-target link formation, in a manner that considers the trade-off between ML model accuracy and communication energy efficiency. To obtain a concrete objective function, we derive a measurable generalization error bound that accounts for estimates of source-target hypothesis deviations and divergences between data distributions. The resulting optimization problem is a mixed-integer signomial program, a class of NP-hard problems, for which we develop an algorithm based on successive convex approximations to solve it tractably. Subsequent numerical evaluations of ST-LF demonstrate that it improves classification accuracy and energy efficiency over state-of-the-art baselines.

7.7LGOct 4, 2023

Digital Ethics in Federated Learning

Liangqi Yuan, Ziran Wang, Christopher G. Brinton

The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development.

1.2SYMay 7, 2022

Deep Reinforcement Learning-Based Adaptive IRS Control with Limited Feedback Codebooks

Junghoon Kim, Seyyedali Hosseinalipour, Andrew C. Marcum et al.

Intelligent reflecting surfaces (IRS) consist of configurable meta-atoms, which can alter the wireless propagation environment through design of their reflection coefficients. We consider adaptive IRS control in the practical setting where (i) the IRS reflection coefficients are attained by adjusting tunable elements embedded in the meta-atoms, (ii) the IRS reflection coefficients are affected by the incident angles of the incoming signals, (iii) the IRS is deployed in multi-path, time-varying channels, and (iv) the feedback link from the base station (BS) to the IRS has a low data rate. Conventional optimization-based IRS control protocols, which rely on channel estimation and conveying the optimized variables to the IRS, are not practical in this setting due to the difficulty of channel estimation and the low data rate of the feedback channel. To address these challenges, we develop a novel adaptive codebook-based limited feedback protocol to control the IRS. We propose two solutions for adaptive IRS codebook design: (i) random adjacency (RA), which utilizes correlations across the channel realizations, and (ii) deep neural network policy-based IRS control (DPIC), which is based on a deep reinforcement learning. Numerical evaluations show that the data rate and average data rate over one coherence time are improved substantially by the proposed schemes.

3.3LGNov 23, 2022

Resource-Constrained Decentralized Federated Learning via Personalized Event-Triggering

Shahryar Zehtabi, Seyyedali Hosseinalipour, Christopher G. Brinton

Federated learning (FL) is a popular technique for distributing machine learning (ML) across a set of edge devices. In this paper, we study fully decentralized FL, where in addition to devices conducting training locally, they carry out model aggregations via cooperative consensus formation over device-to-device (D2D) networks. We introduce asynchronous, event-triggered communications among the devices to handle settings where access to a central server is not feasible. To account for the inherent resource heterogeneity and statistical diversity challenges in FL, we define personalized communication triggering conditions at each device that weigh the change in local model parameters against the available local network resources. We theoretically recover the $O(\ln{k} / \sqrt{k})$ convergence rate to the globally optimal model of decentralized gradient descent (DGD) methods in the setup of our methodology. We provide our convergence guarantees for the last iterates of models, under relaxed graph connectivity and data heterogeneity assumptions compared with the existing literature. To do so, we demonstrate a $B$-connected information flow guarantee in the presence of sporadic communications over the time-varying D2D graph. Our subsequent numerical evaluations demonstrate that our methodology obtains substantial improvements in convergence speed and/or communication savings compared to existing decentralized FL baselines.

18.4LGNov 1, 2023

StableFDG: Style and Attention Based Learning for Federated Domain Generalization

Jungwuk Park, Dong-Jun Han, Jinho Kim et al.

Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy.

9.7DCJul 20, 2023

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

Yongjeong Oh, Jaeho Lee, Christopher G. Brinton et al.

This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-100, and CelebA datasets demonstrate that SplitFC outperforms state-of-the-art SL frameworks by significantly reducing communication overheads while maintaining high accuracy.

4.6LGApr 28, 2022Code

Process-BERT: A Framework for Representation Learning on Educational Process Data

Alexander Scarlatos, Christopher Brinton, Andrew Lan

Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. However, analyzing process data is challenging since the specific format of process data varies a lot depending on different learning/testing scenarios. In this paper, we propose a framework for learning representations of educational process data that is applicable across many different learning scenarios. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation's report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.

7.9LGSep 7, 2024

Unlocking the Potential of Model Calibration in Federated Learning

Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour et al.

Over the past several years, various federated learning (FL) methodologies have been developed to improve model accuracy, a primary performance metric in machine learning. However, to utilize FL in practical decision-making scenarios, beyond considering accuracy, the trained model must also have a reliable confidence in each of its predictions, an aspect that has been largely overlooked in existing FL research. Motivated by this gap, we propose Non-Uniform Calibration for Federated Learning (NUCFL), a generic framework that integrates FL with the concept of model calibration. The inherent data heterogeneity in FL environments makes model calibration particularly difficult, as it must ensure reliability across diverse data distributions and client conditions. Our NUCFL addresses this challenge by dynamically adjusting the model calibration objectives based on statistical relationships between each client's local model and the global model in FL. In particular, NUCFL assesses the similarity between local and global model relationships, and controls the penalty term for the calibration loss during client-side local training. By doing so, NUCFL effectively aligns calibration needs for the global model in heterogeneous FL settings while not sacrificing accuracy. Extensive experiments show that NUCFL offers flexibility and effectiveness across various FL algorithms, enhancing accuracy as well as model calibration.

18.4LGOct 9, 2023

Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

Guangchen Lan, Han Wang, James Anderson et al.

Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of multipliers (ADMM) to approximate global NPG directions efficiently. We theoretically demonstrate that using ADMM-based gradient updates reduces communication complexity from ${O}({d^{2}})$ to ${O}({d})$ at each iteration, where $d$ is the number of model parameters. Furthermore, we show that achieving an $ε$-error stationary convergence requires ${O}(\frac{1}{(1-γ)^{2}ε})$ iterations for discount factor $γ$, demonstrating that FedNPG-ADMM maintains the same convergence rate as the standard FedNPG. Through evaluation of the proposed algorithms in MuJoCo environments, we demonstrate that FedNPG-ADMM maintains the reward performance of standard FedNPG, and that its convergence rate improves when the number of federated agents increases.

11.4LGMay 9

PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

Liangqi Yuan, Wenzhi Fang, Shiqiang Wang et al.

Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workloads, and existing sanitizers force a choice between policy flexibility and the structural fidelity tool calls require. In this work, we develop PAAC, a privacy-aware agentic framework that aligns planner--executor decomposition with the device-cloud boundary so that role specialization itself becomes the privacy mechanism. The cloud agent reasons over typed placeholder tokens that preserve each sensitive value's reasoning role while discarding its content, while the on-device agent identifies sensitive spans and distills each step's execution outcome into compact key findings. Sanitization confines the on-device LLM to proposing which spans to mask, while a deterministic registry performs all substitution and reversal, keeping actions directly executable on device. On three agentic benchmarks under strict privacy settings, PAAC dominates the Pareto frontier of privacy and accuracy, improving average accuracy by 15-36\% and reducing average leakage by 2-6$\times$ over state-of-the-art device-cloud baselines, with the largest margins on privacy targets outside fixed entity taxonomies. We find consistent improvements on 17 additional benchmarks spanning 10 domains, including math, science, and finance.

8.9NIMar 16

A Hierarchical Gradient Tracking Algorithm for Mitigating Subnet-Drift in Fog Learning Networks

Evan Chen, Shiqiang Wang, Christopher G. Brinton

Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.

12.6AIMay 9

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

Wenzhi Fang, Liangqi Yuan, Guangchen Lan et al.

Multi-agent large language model (LLM) systems often rely on a controller to coordinate a pool of heterogeneous models, yet existing controllers are typically limited to one-shot routing: they select a model once and return its output directly. Such routing-only designs provide no mechanism to critique intermediate drafts or support iterative refinement. To address this limitation, we propose a critique-and-routing controller that casts multi-agent coordination as a sequential decision problem. At each turn, the controller evaluates the current draft, decides whether to stop or continue, and, if needed, selects the next agent for further refinement. We formulate this process as a finite-horizon Markov Decision Process (MDP) with explicit agent-utilization constraints, design a composite reward for controller decisions across turns, and optimize the controller via policy gradients under a Lagrangian-relaxed objective. Extensive experiments across multiple heterogeneous multi-agent systems and seven reasoning benchmarks show that our method consistently outperforms state-of-the-art baselines and substantially narrows the gap to the strongest agent, while using it for fewer than 25% of total calls.

12.3LGOct 10, 2023

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication

Liangqi Yuan, Dong-Jun Han, Vishnu Pandi Chellapandi et al.

Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.

7.7LGOct 27, 2023

Federated Learning over Hierarchical Wireless Networks: Training Latency Minimization via Submodel Partitioning

Wenzhi Fang, Dong-Jun Han, Christopher G. Brinton

Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained wireless devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical cloud-edge-client networks. The key idea behind HIST is to divide the global model into disjoint partitions (or submodels) per round so that each group of clients (i.e., cells) is responsible for training only one partition of the model. We characterize the convergence behavior of HIST under mild assumptions, showing the impacts of several key attributes (e.g., submodel sizes, number of cells, edge and global aggregation frequencies) on the rate and stationarity gap. Building upon the theoretical results, we propose a submodel partitioning strategy to minimize the training latency depending on network resource availability and a target learning performance guarantee. We then demonstrate how HIST can be augmented with over-the-air computation (AirComp) to further enhance the efficiency of the model aggregation over the edge cells. Through numerical evaluations, we verify that HIST is able to save training time and communication costs by wide margins while achieving comparable accuracy as conventional HFL. Moreover, our experiments demonstrate that AirComp-assisted HIST provides further improvements in training latency.

4.1CRNov 14, 2023Code

The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks

Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis et al.

As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the role of adversarial node placement in decentralized networks remains largely unexplored. This paper addresses this gap by analyzing the performance of decentralized FL for various adversarial placement strategies when adversaries can jointly coordinate their placement within a network. We establish two baseline strategies for placing adversarial node: random placement and network centrality-based placement. Building on this foundation, we propose a novel attack algorithm that prioritizes adversarial spread over adversarial centrality by maximizing the average network distance between adversaries. We show that the new attack algorithm significantly impacts key performance metrics such as testing accuracy, outperforming the baseline frameworks by between $9\%$ and $66.5\%$ for the considered setups. Our findings provide valuable insights into the vulnerabilities of decentralized FL systems, setting the stage for future research aimed at developing more secure and robust decentralized FL frameworks.

21.1LGApr 9, 2024

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi et al.

To improve the efficiency of reinforcement learning (RL), we propose a novel asynchronous federated reinforcement learning (FedRL) framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To address the challenge of lagged policies in asynchronous settings, we design a delay-adaptive lookahead technique \textit{specifically for FedRL} that can effectively handle heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $O(\frac{ε^{-2.5}}{N})$ sample complexity for global convergence at each agent on average. Compared to the single agent setting with $O(ε^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $O(\frac{t_{\max}}{N})$ to $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$, where $t_{i}$ denotes the time consumption in each iteration at agent $i$, and $t_{\max}$ is the largest one. The latter complexity $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performance of AFedPG in four widely used MuJoCo environments with varying numbers of agents. We also demonstrate the advantages of AFedPG in various computing heterogeneity scenarios.

15.7LGJan 30, 2024

Communication-Efficient Multimodal Federated Learning: Joint Modality and Client Selection

Liangqi Yuan, Dong-Jun Han, Su Wang et al.

Multimodal federated learning (FL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client will be diverse, and (ii) communication limitations prevent clients from uploading all their locally trained modality models to the server. In this paper, we propose multimodal Federated learning with joint Modality and Client selection (mmFedMC), a new FL methodology that can tackle the above-mentioned challenges in multimodal settings. The joint selection algorithm incorporates two main components: (a) A modality selection methodology for each client, which weighs (i) the impact of the modality, gauged by Shapley value analysis, (ii) the modality model size as a gauge of communication overhead, against (iii) the frequency of modality model updates, denoted recency, to enhance generalizability. (b) A client selection strategy for the server based on the local loss of modality model at each client. Experiments on five real-world datasets demonstrate the ability of mmFedMC to achieve comparable accuracy to several baselines while reducing the communication overhead by over 20x. A demo video of our methodology is available at https://liangqiy.com/mmfedmc/.

14.4LGMay 8, 2025

Federated Learning for Cyber Physical Systems: A Comprehensive Survey

Minh K. Quan, Pubudu N. Pathirana, Mayuri Wijayasundara et al.

The integration of machine learning (ML) in cyber physical systems (CPS) is a complex task due to the challenges that arise in terms of real-time decision making, safety, reliability, device heterogeneity, and data privacy. There are also open research questions that must be addressed in order to fully realize the potential of ML in CPS. Federated learning (FL), a distributed approach to ML, has become increasingly popular in recent years. It allows models to be trained using data from decentralized sources. This approach has been gaining popularity in the CPS field, as it integrates computer, communication, and physical processes. Therefore, the purpose of this work is to provide a comprehensive analysis of the most recent developments of FL-CPS, including the numerous application areas, system topologies, and algorithms developed in recent years. The paper starts by discussing recent advances in both FL and CPS, followed by their integration. Then, the paper compares the application of FL in CPS with its applications in the internet of things (IoT) in further depth to show their connections and distinctions. Furthermore, the article scrutinizes how FL is utilized in critical CPS applications, e.g., intelligent transportation systems, cybersecurity services, smart cities, and smart healthcare solutions. The study also includes critical insights and lessons learned from various FL-CPS implementations. The paper's concluding section delves into significant concerns and suggests avenues for further research in this fast-paced and dynamic era.

21.3LGFeb 16, 2025Code

Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings

Liangqi Yuan, Dong-Jun Han, Shiqiang Wang et al.

Compared to traditional machine learning models, recent large language models (LLMs) can exhibit multi-task-solving capabilities through multiple dialogues and multi-modal data sources. These unique characteristics of LLMs, together with their large model size, make their deployment more challenging. Specifically, (i) deploying LLMs on local devices faces computational, memory, and energy resource issues, while (ii) deploying them in the cloud cannot guarantee real-time service and incurs communication/usage costs. In this paper, we design TMO, a local-cloud LLM inference system with Three-M Offloading: Multi-modal, Multi-task, and Multi-dialogue. TMO incorporates (i) a lightweight local LLM that can process simple tasks at high speed and (ii) a large-scale cloud LLM that can handle multi-modal data sources. We develop a resource-constrained reinforcement learning (RCRL) strategy for TMO that optimizes the inference location (i.e., local vs. cloud) and multi-modal data sources to use for each task/dialogue, aiming to maximize the long-term reward (response quality, latency, and usage cost) while adhering to resource constraints. We also contribute M4A1, a new dataset we curated that contains reward and cost metrics across multiple modality, task, dialogue, and LLM configurations, enabling evaluation of offloading decisions. We demonstrate the effectiveness of TMO compared to several exploration-decision and LLM-as-Agent baselines, showing significant improvements in latency, cost, and response quality.

26.2AIMay 29, 2025

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi et al.

As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to reason explicitly about CI when deciding what information to disclose. We then extend this approach by developing a reinforcement learning (RL) framework that further instills in models the reasoning necessary to achieve CI. Using a synthetic, automatically created, dataset of only $\sim700$ examples but with diverse contexts and information disclosure norms, we show that our method substantially reduces inappropriate information disclosure while maintaining task performance across multiple model sizes and families. Importantly, improvements transfer from this synthetic dataset to established CI benchmarks such as PrivacyLens that has human annotations and evaluates privacy leakage of AI assistants in actions and tool calls.

2.7CLJan 15, 2024

Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation

Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton

Federated learning (FL) is a promising distributed machine learning paradigm that enables multiple clients to collaboratively train a global model. In this paper, we focus on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training. Our approach learns a dynamic threshold for filtering parameters prior to transmission without compromising the NMT model quality, based on the tensor deviations of clients between different FL rounds. Through experiments on two NMT datasets with different language distributions, we demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.

11.5LGFeb 5, 2024Code

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis et al.

Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning ($\texttt{DSpodFL}$), a DFL methodology built on a generalized notion of $\textit{sporadicity}$ in both local gradient and aggregation processes. $\texttt{DSpodFL}$ subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing $\textit{heterogeneous and time-varying}$ computation/communication scenarios. We analytically characterize the convergence behavior of $\texttt{DSpodFL}$ for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that $\texttt{DSpodFL}$ consistently achieves improved training speeds compared with baselines under various system settings.

6.4LGApr 15, 2024

Unsupervised Federated Optimization at the Edge: D2D-Enabled Learning without Labels

Satyavrat Wagle, Seyyedali Hosseinalipour, Naji Khosravan et al.

Federated learning (FL) is a popular solution for distributed machine learning (ML). While FL has traditionally been studied for supervised ML tasks, in many applications, it is impractical to assume availability of labeled data across devices. To this end, we develop Cooperative Federated unsupervised Contrastive Learning ({\tt CF-CL)} to facilitate FL across edge devices with unlabeled datasets. {\tt CF-CL} employs local device cooperation where either explicit (i.e., raw data) or implicit (i.e., embeddings) information is exchanged through device-to-device (D2D) communications to improve local diversity. Specifically, we introduce a \textit{smart information push-pull} methodology for data/embedding exchange tailored to FL settings with either soft or strict data privacy restrictions. Information sharing is conducted through a probabilistic importance sampling technique at receivers leveraging a carefully crafted reserve dataset provided by transmitters. In the implicit case, embedding exchange is further integrated into the local ML training at the devices via a regularization term incorporated into the contrastive loss, augmented with a dynamic contrastive margin to adjust the volume of latent space explored. Numerical evaluations demonstrate that {\tt CF-CL} leads to alignment of latent spaces learned across devices, results in faster and more efficient global model training, and is effective in extreme non-i.i.d. data distribution settings across devices.

6.4LGFeb 15, 2024

Smart Information Exchange for Unsupervised Federated Learning via Reinforcement Learning

Seohyun Lee, Anindya Bijoy Das, Satyavrat Wagle et al.

One of the main challenges of decentralized machine learning paradigms such as Federated Learning (FL) is the presence of local non-i.i.d. datasets. Device-to-device transfers (D2D) between distributed devices has been shown to be an effective tool for dealing with this problem and robust to stragglers. In an unsupervised case, however, it is not obvious how data exchanges should take place due to the absence of labels. In this paper, we propose an approach to create an optimal graph for data transfer using Reinforcement Learning. The goal is to form links that will provide the most benefit considering the environment's constraints and improve convergence speed in an unsupervised FL environment. Numerical analysis shows the advantages in terms of convergence speed and straggler resilience of the proposed method to different available FL schemes and benchmark datasets.

5.3LGDec 27, 2023

Robust Collaborative Inference with Vertically Split Data Over Dynamic Device Environments

Surojit Ganguli, Zeyu Zhou, Christopher G. Brinton et al.

When each edge device of a network only perceives a local part of the environment, collaborative inference across multiple devices is often needed to predict global properties of the environment. In safety-critical applications, collaborative inference must be robust to significant network failures caused by environmental disruptions or extreme weather. Existing collaborative learning approaches, such as privacy-focused Vertical Federated Learning (VFL), typically assume a centralized setup or that one device never fails. However, these assumptions make prior approaches susceptible to significant network failures. To address this problem, we first formalize the problem of robust collaborative inference over a dynamic network of devices that could experience significant network faults. Then, we develop a minimalistic yet impactful method called Multiple Aggregation with Gossip Rounds and Simulated Faults (MAGS) that synthesizes simulated faults via dropout, replication, and gossiping to significantly improve robustness over baselines. We also theoretically analyze our proposed approach to explain why each component enhances robustness. Extensive empirical results validate that MAGS is robust across a range of fault rates-including extreme fault rates.

1.2SYApr 22, 2024

Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs

David R. Nickel, Anindya Bijoy Das, David J. Love et al.

Opportunistic spectrum access has the potential to increase the efficiency of spectrum utilization in cognitive radio networks (CRNs). In CRNs, both spectrum sensing and resource allocation (SSRA) are critical to maximizing system throughput while minimizing collisions of secondary users with the primary network. However, many works in dynamic spectrum access do not consider the impact of imperfect sensing information such as mis-detected channels, which the additional information available in joint SSRA can help remediate. In this work, we examine joint SSRA as an optimization which seeks to maximize a CRN's net communication rate subject to constraints on channel sensing, channel access, and transmit power. Given the non-trivial nature of the problem, we leverage multi-agent reinforcement learning to enable a network of secondary users to dynamically access unoccupied spectrum via only local test statistics, formulated under the energy detection paradigm of spectrum sensing. In doing so, we develop a novel multi-agent implementation of hybrid soft actor critic, MHSAC, based on the QMIX mixing scheme. Through experiments, we find that our SSRA algorithm, HySSRA, is successful in maximizing the CRN's utilization of spectrum resources while also limiting its interference with the primary network, and outperforms the current state-of-the-art by a wide margin. We also explore the impact of wireless variations such as coherence time on the efficacy of the system.

11.4LGJan 16, 2025

Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning

Seohyun Lee, Wenzhi Fang, Anindya Bijoy Das et al.

Federated learning (FL) is vulnerable to backdoor attacks, where adversaries alter model behavior on target classification labels by embedding triggers into data samples. While these attacks have received considerable attention in horizontal FL, they are less understood for vertical FL (VFL), where devices hold different features of the samples, and only the server holds the labels. In this work, we propose a novel backdoor attack on VFL which (i) does not rely on gradient information from the server and (ii) considers potential collusion among multiple adversaries for sample selection and trigger embedding. Our label inference model augments variational autoencoders with metric learning, which adversaries can train locally. A consensus process over the adversary graph topology determines which datapoints to poison. We further propose methods for trigger splitting across the adversaries, with an intensity-based implantation scheme skewing the server towards the trigger. Our convergence analysis reveals the impact of backdoor perturbations on VFL indicated by a stationarity gap for the trained model, which we verify empirically as well. We conduct experiments comparing our attack with recent backdoor VFL approaches, finding that ours obtains significantly higher success rates for the same main task performance despite not using server information. Additionally, our results verify the impact of collusion on attack performance.

2.3CRNov 5, 2024

Enhanced Real-Time Threat Detection in 5G Networks: A Self-Attention RNN Autoencoder Approach for Spectral Intrusion Analysis

Mohammadreza Kouchaki, Minglong Zhang, Aly S. Abdalla et al.

In the rapidly evolving landscape of 5G technology, safeguarding Radio Frequency (RF) environments against sophisticated intrusions is paramount, especially in dynamic spectrum access and management. This paper presents an enhanced experimental model that integrates a self-attention mechanism with a Recurrent Neural Network (RNN)-based autoencoder for the detection of anomalous spectral activities in 5G networks at the waveform level. Our approach, grounded in time-series analysis, processes in-phase and quadrature (I/Q) samples to identify irregularities that could indicate potential jamming attacks. The model's architecture, augmented with a self-attention layer, extends the capabilities of RNN autoencoders, enabling a more nuanced understanding of temporal dependencies and contextual relationships within the RF spectrum. Utilizing a simulated 5G Radio Access Network (RAN) test-bed constructed with srsRAN 5G and Software Defined Radios (SDRs), we generated a comprehensive stream of data that reflects real-world RF spectrum conditions and attack scenarios. The model is trained to reconstruct standard signal behavior, establishing a normative baseline against which deviations, indicative of security threats, are identified. The proposed architecture is designed to balance between detection precision and computational efficiency, so the LSTM network, enriched with self-attention, continues to optimize for minimal execution latency and power consumption. Conducted on a real-world SDR-based testbed, our results demonstrate the model's improved performance and accuracy in threat detection. Keywords: self-attention, real-time intrusion detection, RNN autoencoder, Transformer architecture, LSTM, time series anomaly detection, 5G Security, spectrum access security.

2.3SPApr 21, 2024

Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning

Myeung Suk Oh, Anindya Bijoy Das, Taejoon Kim et al.

Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features.

4.6LGFeb 3, 2024

Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour et al.

A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) result in significant accuracy variance, failing to provide a balanced performance across clients. To address these challenges, we propose CoPreFL, a collaborative/distributed pre-training approach which provides a robust initialization for downstream FL tasks. The key idea of CoPreFL is a model-agnostic meta-learning (MAML) procedure that tailors the global model to closely mimic heterogeneous and unseen FL scenarios, resulting in a pre-trained model that is rapidly adaptable to arbitrary FL tasks. Our MAML procedure incorporates performance variance into the meta-objective function, balancing performance across clients rather than solely optimizing for accuracy. Through extensive experiments, we demonstrate that CoPreFL obtains significant improvements in both average accuracy and variance across arbitrary downstream FL tasks with unseen/seen labels, compared with various pre-training baselines. We also show how CoPreFL is compatible with different well-known FL algorithms applied by the downstream tasks, enhancing performance in each case.

1.2ITDec 31, 2023

Coding for Gaussian Two-Way Channels: Linear and Learning-Based Approaches

Junghoon Kim, Taejoon Kim, Anindya Bijoy Das et al.

Although user cooperation cannot improve the capacity of Gaussian two-way channels (GTWCs) with independent noises, it can improve communication reliability. In this work, we aim to enhance and balance the communication reliability in GTWCs by minimizing the sum of error probabilities via joint design of encoders and decoders at the users. We first formulate general encoding/decoding functions, where the user cooperation is captured by the coupling of user encoding processes. The coupling effect renders the encoder/decoder design non-trivial, requiring effective decoding to capture this effect, as well as efficient power management at the encoders within power constraints. To address these challenges, we propose two different two-way coding strategies: linear coding and learning-based coding. For linear coding, we propose optimal linear decoding and discuss new insights on encoding regarding user cooperation to balance reliability. We then propose an efficient algorithm for joint encoder/decoder design. For learning-based coding, we introduce a novel recurrent neural network (RNN)-based coding architecture, where we propose interactive RNNs and a power control layer for encoding, and we incorporate bi-directional RNNs with an attention mechanism for decoding. Through simulations, we show that our two-way coding methodologies outperform conventional channel coding schemes (that do not utilize user cooperation) significantly in sum-error performance. We also demonstrate that our linear coding excels at high signal-to-noise ratios (SNRs), while our RNN-based coding performs best at low SNRs. We further investigate our two-way coding strategies in terms of power distribution, two-way coding benefit, different coding rates, and block-length gain.

4.1LGMay 30, 2025

RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget

Adam Piaseczny, Md Kamran Chowdhury Shisher, Shiqiang Wang et al.

Machine learning (ML) algorithms deployed in real-world environments are often faced with the challenge of adapting models to concept drift, where the task data distributions are shifting over time. The problem becomes even more difficult when model performance must be maintained under adherence to strict resource constraints. Existing solutions often depend on drift-detection methods that produce high computational overhead for resource-constrained environments, and fail to provide strict guarantees on resource usage or theoretical performance assurances. To address these shortcomings, we propose RCCDA: a dynamic model update policy that optimizes ML training dynamics while ensuring compliance to predefined resource constraints, utilizing only past loss information and a tunable drift threshold. In developing our policy, we analytically characterize the evolution of model loss under concept drift with arbitrary training update decisions. Integrating these results into a Lyapunov drift-plus-penalty framework produces a lightweight greedy-optimal policy that provably limits update frequency and cost. Experimental results on four domain generalization datasets demonstrate that our policy outperforms baseline methods in inference accuracy while adhering to strict resource constraints under several schedules of concept drift, making our solution uniquely suited for real-time ML deployments.

4.1LGApr 8, 2025

Decentralized Domain Generalization with Style Sharing: Formal Model and Convergence Analysis

Shahryar Zehtabi, Dong-Jun Han, Seyyedali Hosseinalipour et al.

Much of federated learning (FL) focuses on settings where local dataset statistics remain the same between training and testing. However, this assumption often does not hold in practice due to distribution shifts, motivating the development of domain generalization (DG) approaches that leverage source domain data to train models capable of generalizing to unseen target domains. In this paper, we are motivated by two major gaps in existing work on FL and DG: (1) the lack of formal mathematical analysis of DG objectives; and (2) DG research in FL being limited to the star-topology architecture. We develop Decentralized Federated Domain Generalization with Style Sharing ($\textit{StyleDDG}$), a decentralized DG algorithm which allows devices in a peer-to-peer network to achieve DG based on sharing style information inferred from their datasets. Additionally, we provide the first systematic approach to analyzing style-based DG training in decentralized networks. We cast existing centralized DG algorithms within our framework, and employ their formalisms to model $\textit{StyleDDG}$. We then obtain analytical conditions under which convergence of $\textit{StyleDDG}$ can be guaranteed. Through experiments on popular DG datasets, we demonstrate that $\textit{StyleDDG}$ can obtain significant improvements in accuracy across target domains with minimal communication overhead compared to baseline decentralized gradient methods.