Shenghui Song

h-index31

26papers

1,348citations

Novelty55%

AI Score56

Ranked #7,541 of 194,257 authors (top 4%)#2,000 in LG (top 5%)

26 Papers

6.6ITFeb 14, 2023Code

Message Passing Meets Graph Neural Networks: A New Paradigm for Massive MIMO Systems

Hengtao He, Xianghao Yu, Jun Zhang et al.

As one of the core technologies for 5G systems, massive multiple-input multiple-output (MIMO) introduces dramatic capacity improvements along with very high beamforming and spatial multiplexing gains. When developing efficient physical layer algorithms for massive MIMO systems, message passing is one promising candidate owing to the superior performance. However, as their computational complexity increases dramatically with the problem size, the state-of-the-art message passing algorithms cannot be directly applied to future 6G systems, where an exceedingly large number of antennas are expected to be deployed. To address this issue, we propose a model-driven deep learning (DL) framework, namely the AMP-GNN for massive MIMO transceiver design, by considering the low complexity of the AMP algorithm and adaptability of GNNs. Specifically, the structure of the AMP-GNN network is customized by unfolding the approximate message passing (AMP) algorithm and introducing a graph neural network (GNN) module into it. The permutation equivariance property of AMP-GNN is proved, which enables the AMP-GNN to learn more efficiently and to adapt to different numbers of users. We also reveal the underlying reason why GNNs improve the AMP algorithm from the perspective of expectation propagation, which motivates us to amalgamate various GNNs with different message passing algorithms. In the simulation, we take the massive MIMO detection to exemplify that the proposed AMP-GNN significantly improves the performance of the AMP detector, achieves comparable performance as the state-of-the-art DL-based MIMO detectors, and presents strong robustness to various mismatches.

5.3LGAug 7, 2023

Binary Federated Learning with Client-Level Differential Privacy

Lumin Liu, Jun Zhang, Shenghui Song et al.

Federated learning (FL) is a privacy-preserving collaborative learning framework, and differential privacy can be applied to further enhance its privacy protection. Existing FL systems typically adopt Federated Average (FedAvg) as the training algorithm and implement differential privacy with a Gaussian mechanism. However, the inherent privacy-utility trade-off in these systems severely degrades the training performance if a tight privacy budget is enforced. Besides, the Gaussian mechanism requires model weights to be of high-precision. To improve communication efficiency and achieve a better privacy-utility trade-off, we propose a communication-efficient FL training algorithm with differential privacy guarantee. Specifically, we propose to adopt binary neural networks (BNNs) and introduce discrete noise in the FL setting. Binary model parameters are uploaded for higher communication efficiency and discrete noise is added to achieve the client-level differential privacy protection. The achieved performance guarantee is rigorously proved, and it is shown to depend on the level of discrete noise. Experimental results based on MNIST and Fashion-MNIST datasets will demonstrate that the proposed training algorithm achieves client-level privacy protection with performance gain while enjoying the benefits of low communication overhead from binary model updates.

1.2ITNov 28, 2022

Lightweight and Flexible Deep Equilibrium Learning for CSI Feedback in FDD Massive MIMO

Yifan Ma, Wentao Yu, Xianghao Yu et al.

In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, downlink channel state information (CSI) needs to be sent back to the base station (BS) by the users, which causes prohibitive feedback overhead. In this paper, we propose a lightweight and flexible deep learning-based CSI feedback approach by capitalizing on deep equilibrium models. Different from existing deep learning-based methods that stack multiple explicit layers, we propose an implicit equilibrium block to mimic the behavior of an infinite-depth neural network. In particular, the implicit equilibrium block is defined by a fixed-point iteration and the trainable parameters in different iterations are shared, which results in a lightweight model. Furthermore, the number of forward iterations can be adjusted according to users' computation capability, enabling a flexible accuracy-efficiency trade-off. Simulation results will show that the proposed design obtains a comparable performance as the benchmarks but with much-reduced complexity and permits an accuracy-efficiency trade-off at runtime.

20.2LGApr 18, 2022Code

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Zhijie Xie, S. H. Song

As a distributed learning paradigm, Federated Learning (FL) faces the communication bottleneck issue due to many rounds of model synchronization and aggregation. Heterogeneous data further deteriorates the situation by causing slow convergence. Although the impact of data heterogeneity on supervised FL has been widely studied, the related investigation for Federated Reinforcement Learning (FRL) is still in its infancy. In this paper, we first define the type and level of data heterogeneity for policy gradient based FRL systems. By inspecting the connection between the global and local objective functions, we prove that local training can benefit the global objective, if the local update is properly penalized by the total variation (TV) distance between the local and global policies. A necessary condition for the global policy to be learn-able from the local policy is also derived, which is directly related to the heterogeneity level. Based on the theoretical result, a Kullback-Leibler (KL) divergence based penalty is proposed, which, different from the conventional method that penalizes the model divergence in the parameter space, directly constrains the model outputs in the distribution space. Convergence proof of the proposed algorithm is also provided. By jointly penalizing the divergence of the local policy from the global policy with a global penalty and constraining each iteration of the local training with a local penalty, the proposed method achieves a better trade-off between training speed (step size) and convergence. Experiment results on two popular Reinforcement Learning (RL) experiment platforms demonstrate the advantage of the proposed algorithm over existing methods in accelerating and stabilizing the training process with heterogeneous data.

3.7CVJul 23, 2022

SSBNet: Improving Visual Recognition Efficiency by Adaptive Sampling

Ho Man Kwan, Shenghui Song

Downsampling is widely adopted to achieve a good trade-off between accuracy and latency for visual recognition. Unfortunately, the commonly used pooling layers are not learned, and thus cannot preserve important information. As another dimension reduction method, adaptive sampling weights and processes regions that are relevant to the task, and is thus able to better preserve useful information. However, the use of adaptive sampling has been limited to certain layers. In this paper, we show that using adaptive sampling in the building blocks of a deep neural network can improve its efficiency. In particular, we propose SSBNet which is built by inserting sampling layers repeatedly into existing networks like ResNet. Experiment results show that the proposed SSBNet can achieve competitive image classification and object detection performance on ImageNet and COCO datasets. For example, the SSB-ResNet-RS-200 achieved 82.6% accuracy on ImageNet dataset, which is 0.6% higher than the baseline ResNet-RS-152 with a similar complexity. Visualization shows the advantage of SSBNet in allowing different layers to focus on different positions, and ablation studies further validate the advantage of adaptive sampling over uniform methods.

1.2ITDec 25, 2025

Near-Field Communication with Massive Movable Antennas: An Electrostatic Equilibrium Perspective

Shicong Liu, Xianghao Yu, Shenghui Song et al.

Recent advancements in large-scale position-reconfigurable antennas have opened up new dimensions to effectively utilize the spatial degrees of freedom (DoFs) of wireless channels. However, the deployment of existing antenna placement schemes is primarily hindered by their limited scalability and frequently overlooked near-field effects in large-scale antenna systems. In this paper, we propose a novel antenna placement approach tailored for near-field massive multiple-input multiple-output systems, which effectively exploits the spatial DoFs to enhance spectral efficiency. For that purpose, we first reformulate the antenna placement problem in the angular domain, resulting in a weighted Fekete problem. We then derive the optimality condition and reveal that the {optimal} antenna placement is in principle an electrostatic equilibrium problem. To further reduce the computational complexity of numerical optimization, we propose an ordinary differential equation (ODE)-based framework to efficiently solve the equilibrium problem. In particular, the optimal antenna positions are characterized by the roots of the polynomial solutions to specific ODEs in the normalized angular domain. By simply adopting a two-step eigenvalue decomposition (EVD) approach, the optimal antenna positions can be efficiently obtained. Furthermore, we perform an asymptotic analysis when the antenna size tends to infinity, which yields a closed-form solution. Simulation results demonstrate that the proposed scheme efficiently harnesses the spatial DoFs of near-field channels with prominent gains in spectral efficiency and maintains robustness against system parameter mismatches. In addition, the derived asymptotic closed-form {solution} closely approaches the theoretical optimum across a wide range of practical scenarios.

5.3LGJul 10, 2023

Fairness-aware Federated Minimax Optimization with Convergence Guarantee

Gerry Windiarto Mohamad Dunda, Shenghui Song

Federated learning (FL) has garnered considerable attention due to its privacy-preserving feature. Nonetheless, the lack of freedom in managing user data can lead to group fairness issues, where models are biased towards sensitive factors such as race or gender. To tackle this issue, this paper proposes a novel algorithm, fair federated averaging with augmented Lagrangian method (FFALM), designed explicitly to address group fairness issues in FL. Specifically, we impose a fairness constraint on the training objective and solve the minimax reformulation of the constrained optimization problem. Then, we derive the theoretical upper bound for the convergence rate of FFALM. The effectiveness of FFALM in improving fairness is shown empirically on CelebA and UTKFace datasets in the presence of severe statistical heterogeneity.

1.2SPDec 1, 2025

Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks

Kai Zhang, Wentao Yu, Hengtao He et al.

Integrated sensing and communication (ISAC) is a key enabler for low-altitude wireless networks (LAWNs), providing simultaneous environmental perception and data transmission in complex aerial scenarios. By combining heterogeneous sensing modalities such as visual, radar, lidar, and positional information, multimodal ISAC can improve both situational awareness and robustness of LAWNs. However, most existing multimodal fusion approaches use static fusion strategies that treat all modalities equally and cannot adapt to channel heterogeneity or time-varying modality reliability in dynamic low-altitude environments. To address this fundamental limitation, we propose a mixture-of-experts (MoE) framework for multimodal ISAC in LAWNs. Each modality is processed by a dedicated expert network, and a lightweight gating module adaptively assigns fusion weights according to the instantaneous informativeness and reliability of each modality. To improve scalability under the stringent energy constraints of aerial platforms, we further develop a sparse MoE variant that selectively activates only a subset of experts, thereby reducing computation overhead while preserving the benefits of adaptive fusion. Comprehensive simulations on three typical ISAC tasks in LAWNs demonstrate that the proposed frameworks consistently outperform conventional multimodal fusion baselines in terms of learning performance and training sample efficiency.

1.4LGJan 21

Communication-Efficient Multi-Modal Edge Inference via Uncertainty-Aware Distributed Learning

Hang Zhao, Hongru Li, Dongfang Xu et al.

Semantic communication is emerging as a key enabler for distributed edge intelligence due to its capability to convey task-relevant meaning. However, achieving communication-efficient training and robust inference over wireless links remains challenging. This challenge is further exacerbated for multi-modal edge inference (MMEI) by two factors: 1) prohibitive communication overhead for distributed learning over bandwidth-limited wireless links, due to the \emph{multi-modal} nature of the system; and 2) limited robustness under varying channels and noisy multi-modal inputs. In this paper, we propose a three-stage communication-aware distributed learning framework to improve training and inference efficiency while maintaining robustness over wireless channels. In Stage~I, devices perform local multi-modal self-supervised learning to obtain shared and modality-specific encoders without device--server exchange, thereby reducing the communication cost. In Stage~II, distributed fine-tuning with centralized evidential fusion calibrates per-modality uncertainty and reliably aggregates features distorted by noise or channel fading. In Stage~III, an uncertainty-guided feedback mechanism selectively requests additional features for uncertain samples, optimizing the communication--accuracy tradeoff in the distributed setting. Experiments on RGB--depth indoor scene classification show that the proposed framework attains higher accuracy with far fewer training communication rounds and remains robust to modality degradation or channel variation, outperforming existing self-supervised and fully supervised baselines.

5.3LGOct 25, 2023

How Robust is Federated Learning to Communication Error? A Comparison Study Between Uplink and Downlink Channels

Linping Qu, Shenghui Song, Chi-Ying Tsui et al.

Because of its privacy-preserving capability, federated learning (FL) has attracted significant attention from both academia and industry. However, when being implemented over wireless networks, it is not clear how much communication error can be tolerated by FL. This paper investigates the robustness of FL to the uplink and downlink communication error. Our theoretical analysis reveals that the robustness depends on two critical parameters, namely the number of clients and the numerical range of model parameters. It is also shown that the uplink communication in FL can tolerate a higher bit error rate (BER) than downlink communication, and this difference is quantified by a proposed formula. The findings and theoretical analyses are further validated by extensive experiments.

5.1IVNov 12, 2025

ROI-based Deep Image Compression with Implicit Bit Allocation

Kai Hu, Han Wang, Renhe Liu et al.

Region of Interest (ROI)-based image compression has rapidly developed due to its ability to maintain high fidelity in important regions while reducing data redundancy. However, existing compression methods primarily apply masks to suppress background information before quantization. This explicit bit allocation strategy, which uses hard gating, significantly impacts the statistical distribution of the entropy model, thereby limiting the coding performance of the compression model. In response, this work proposes an efficient ROI-based deep image compression model with implicit bit allocation. To better utilize ROI masks for implicit bit allocation, this paper proposes a novel Mask-Guided Feature Enhancement (MGFE) module, comprising a Region-Adaptive Attention (RAA) block and a Frequency-Spatial Collaborative Attention (FSCA) block. This module allows for flexible bit allocation across different regions while enhancing global and local features through frequencyspatial domain collaboration. Additionally, we use dual decoders to separately reconstruct foreground and background images, enabling the coding network to optimally balance foreground enhancement and background quality preservation in a datadriven manner. To the best of our knowledge, this is the first work to utilize implicit bit allocation for high-quality regionadaptive coding. Experiments on the COCO2017 dataset show that our implicit-based image compression method significantly outperforms explicit bit allocation approaches in rate-distortion performance, achieving optimal results while maintaining satisfactory visual quality in the reconstructed background regions.

6.6SPMay 15, 2024

Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

Hongru Li, Jiawei Shao, Hengtao He et al.

Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that the encoded features can generalize to domain-shifted data and detect semanticshifted data, while remaining compact for transmission. In this paper, we propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework. The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization and accurate semantic-shift detection without any knowledge of the test data during training. Specifically, we propose an invariant feature encoding approach based on the IB principle and IRM framework for domainshift generalization, which aims to find the causal relationship between the input data and task result by minimizing the complexity and domain dependence of the encoded feature. Furthermore, we enhance the task-oriented communication with the label-dependent feature encoding approach for semanticshift detection which achieves joint gains in IB optimization and detection performance. To avoid the intractable computation of the IB-based objective, we leverage variational approximation to derive a tractable upper bound for optimization. Extensive simulation results on image classification tasks demonstrate that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff.

5.3LGDec 28, 2023

FedSDD: Scalable and Diversity-enhanced Distillation for Model Aggregation in Federated Learning

Ho Man Kwan, Shenghui Song

Recently, innovative model aggregation methods based on knowledge distillation (KD) have been proposed for federated learning (FL). These methods not only improved the robustness of model aggregation over heterogeneous learning environment, but also allowed training heterogeneous models on client devices. However, the scalability of existing methods is not satisfactory, because the training cost on the server increases with the number of clients, which limits their application in large scale systems. Furthermore, the ensemble of existing methods is built from a set of client models initialized from the same checkpoint, causing low diversity. In this paper, we propose a scalable and diversity-enhanced federated distillation scheme, FedSDD, which decouples the training complexity from the number of clients to enhance the scalability, and builds the ensemble from a set of aggregated models with enhanced diversity. In particular, the teacher model in FedSDD is an ensemble built by a small group of aggregated (global) models, instead of all client models, such that the computation cost will not scale with the number of clients. Furthermore, to enhance diversity, FedSDD only performs KD to enhance one of the global models, i.e., the \textit{main global model}, which improves the performance of both the ensemble and the main global model. While partitioning client model into more groups allow building an ensemble with more aggregated models, the convergence of individual aggregated models will be slow down. We introduce the temporal ensembling which leverage the issues, and provide significant improvement with the heterogeneous settings. Experiment results show that FedSDD outperforms other FL methods, including FedAvg and FedDF, on the benchmark datasets.

3.7CVDec 1, 2024Code

Toward Real-Time Edge AI: Model-Agnostic Task-Oriented Communication with Visual Feature Alignment

Songjie Xie, Hengtao He, Shenghui Song et al.

Task-oriented communication presents a promising approach to improve the communication efficiency of edge inference systems by optimizing learning-based modules to extract and transmit relevant task information. However, real-time applications face practical challenges, such as incomplete coverage and potential malfunctions of edge servers. This situation necessitates cross-model communication between different inference systems, enabling edge devices from one service provider to collaborate effectively with edge servers from another. Independent optimization of diverse edge systems often leads to incoherent feature spaces, which hinders the cross-model inference for existing task-oriented communication. To facilitate and achieve effective cross-model task-oriented communication, this study introduces a novel framework that utilizes shared anchor data across diverse systems. This approach addresses the challenge of feature alignment in both server-based and on-device scenarios. In particular, by leveraging the linear invariance of visual features, we propose efficient server-based feature alignment techniques to estimate linear transformations using encoded anchor data features. For on-device alignment, we exploit the angle-preserving nature of visual features and propose to encode relative representations with anchor data to streamline cross-model communication without additional alignment procedures during the inference. The experimental results on computer vision benchmarks demonstrate the superior performance of the proposed feature alignment approaches in cross-model task-oriented communications. The runtime and computation overhead analysis further confirm the effectiveness of the proposed feature alignment approaches in real-time applications.

4.1LGOct 9, 2025

FedLAM: Low-latency Wireless Federated Learning via Layer-wise Adaptive Modulation

Linping Qu, Shenghui Song, Chi-Ying Tsui

In wireless federated learning (FL), the clients need to transmit the high-dimensional deep neural network (DNN) parameters through bandwidth-limited channels, which causes the communication latency issue. In this paper, we propose a layer-wise adaptive modulation scheme to save the communication latency. Unlike existing works which assign the same modulation level for all DNN layers, we consider the layers' importance which provides more freedom to save the latency. The proposed scheme can automatically decide the optimal modulation levels for different DNN layers. Experimental results show that the proposed scheme can save up to 73.9% of communication latency compared with the existing schemes.

7.1LGJun 2, 2025

The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning

Zhijie Xie, Shenghui Song

In the context of Federated Reinforcement Learning (FRL), applying Proximal Policy Optimization (PPO) faces challenges related to the update order of its actor and critic due to the aggregation step occurring between successive iterations. In particular, when local actors are updated based on local critic estimations, the algorithm becomes vulnerable to data heterogeneity. As a result, the conventional update order in PPO (critic first, then actor) may cause heterogeneous gradient directions among clients, hindering convergence to a globally optimal policy. To address this issue, we propose FedRAC, which reverses the update order (actor first, then critic) to eliminate the divergence of critics from different clients. Theoretical analysis shows that the convergence bound of FedRAC is immune to data heterogeneity under mild conditions, i.e., bounded level of heterogeneity and accurate policy evaluation. Empirical results indicate that the proposed algorithm obtains higher cumulative rewards and converges more rapidly in five experiments, including three classical RL environments and a highly heterogeneous autonomous driving scenario using the SUMO traffic simulator.

2.6LGDec 2, 2024Code

Siamese Machine Unlearning with Knowledge Vaporization and Concentration

Songjie Xie, Hengtao He, Shenghui Song et al.

In response to the practical demands of the ``right to be forgotten" and the removal of undesired data, machine unlearning emerges as an essential technique to remove the learned knowledge of a fraction of data points from trained models. However, existing methods suffer from limitations such as insufficient methodological support, high computational complexity, and significant memory demands. In this work, we propose the concepts of knowledge vaporization and concentration to selectively erase learned knowledge from specific data points while maintaining representations for the remaining data. Utilizing the Siamese networks, we exemplify the proposed concepts and develop an efficient method for machine unlearning. Our proposed Siamese unlearning method does not require additional memory overhead and full access to the remaining dataset. Extensive experiments conducted across multiple unlearning scenarios showcase the superiority of Siamese unlearning over baseline methods, illustrating its ability to effectively remove knowledge from forgetting data, enhance model utility on remaining data, and reduce susceptibility to membership inference attacks.

2.6LGNov 12, 2024

Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks

Tianqu Kang, Zixin Wang, Hengtao He et al.

Fine-tuning large pre-trained foundation models (FMs) on distributed edge devices presents considerable computational and privacy challenges. Federated fine-tuning (FedFT) mitigates some privacy issues by facilitating collaborative model training without the need to share raw data. To lessen the computational burden on resource-limited devices, combining low-rank adaptation (LoRA) with federated learning enables parameter-efficient fine-tuning. Additionally, the split FedFT architecture partitions an FM between edge devices and a central server, reducing the necessity for complete model deployment on individual devices. However, the risk of privacy eavesdropping attacks in FedFT remains a concern, particularly in sensitive areas such as healthcare and finance. In this paper, we propose a split FedFT framework with differential privacy (DP) over wireless networks, where the inherent wireless channel noise in the uplink transmission is utilized to achieve DP guarantees without adding an extra artificial noise. We shall investigate the impact of the wireless noise on convergence performance of the proposed framework. We will also show that by updating only one of the low-rank matrices in the split FedFT with DP, the proposed method can mitigate the noise amplification effect. Simulation results will demonstrate that the proposed framework achieves higher accuracy under strict privacy budgets compared to baseline methods.

3.3ITJun 26, 2024

Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design

Linping Qu, Yuyi Mao, Shenghui Song et al.

One of the most critical challenges for deploying distributed learning solutions, such as federated learning (FL), in wireless networks is the limited battery capacity of mobile clients. While it is a common belief that the major energy consumption of mobile clients comes from the uplink data transmission, this paper presents a novel finding, namely channel decoding also contributes significantly to the overall energy consumption of mobile clients in FL. Motivated by this new observation, we propose an energy-efficient adaptive channel decoding scheme that leverages the intrinsic robustness of FL to model errors. In particular, the robustness is exploited to reduce the energy consumption of channel decoders at mobile clients by adaptively adjusting the number of decoding iterations. We theoretically prove that wireless FL with communication errors can converge at the same rate as the case with error-free communication provided the bit error rate (BER) is properly constrained. An adaptive channel decoding scheme is then proposed to improve the energy efficiency of wireless FL systems. Experimental results demonstrate that the proposed method maintains the same learning accuracy while reducing the channel decoding energy consumption by ~20% when compared to an existing approach.

10.4LGJun 26, 2024

FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization

Linping Qu, Shenghui Song, Chi-Ying Tsui

Federated learning (FL) is a powerful machine learning paradigm which leverages the data as well as the computational resources of clients, while protecting clients' data privacy. However, the substantial model size and frequent aggregation between the server and clients result in significant communication overhead, making it challenging to deploy FL in resource-limited wireless networks. In this work, we aim to mitigate the communication overhead by using quantization. Previous research on quantization has primarily focused on the uplink communication, employing either fixed-bit quantization or adaptive quantization methods. In this work, we introduce a holistic approach by joint uplink and downlink adaptive quantization to reduce the communication overhead. In particular, we optimize the learning convergence by determining the optimal uplink and downlink quantization bit-length, with a communication energy constraint. Theoretical analysis shows that the optimal quantization levels depend on the range of model gradients or weights. Based on this insight, we propose a decreasing-trend quantization for the uplink and an increasing-trend quantization for the downlink, which aligns with the change of the model parameters during the training process. Experimental results show that, the proposed joint uplink and downlink adaptive quantization strategy can save up to 66.7% energy compared with the existing schemes.

3.8LGMay 24, 2023

Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Linxuan Pan, Shenghui Song

With multiple iterations of updates, local statistical gradient descent (L-SGD) has been proven to be very effective in distributed machine learning schemes such as federated learning. In fact, many innovative works have shown that L-SGD with independent and identically distributed (IID) data can even outperform SGD. As a result, extensive efforts have been made to unveil the power of L-SGD. However, existing analysis failed to explain why the multiple local updates with small mini-batches of data (L-SGD) can not be replaced by the update with one big batch of data and a larger learning rate (SGD). In this paper, we offer a new perspective to understand the strength of L-SGD. We theoretically prove that, with IID data, L-SGD can effectively explore the second order information of the loss function. In particular, compared with SGD, the updates of L-SGD have much larger projection on the eigenvectors of the Hessian matrix with small eigenvalues, which leads to faster convergence. Under certain conditions, L-SGD can even approach the Newton method. Experiment results over two popular datasets validate the theoretical results.

13.1LGOct 5, 2021

FedDQ: Communication-Efficient Federated Learning with Descending Quantization

Linping Qu, Shenghui Song, Chi-Ying Tsui

Federated learning (FL) is an emerging learning paradigm without violating users' privacy. However, large model size and frequent model aggregation cause serious communication bottleneck for FL. To reduce the communication volume, techniques such as model compression and quantization have been proposed. Besides the fixed-bit quantization, existing adaptive quantization schemes use ascending-trend quantization, where the quantization level increases with the training stages. In this paper, we first investigate the impact of quantization on model convergence, and show that the optimal quantization level is directly related to the range of the model updates. Given the model is supposed to converge with the progress of the training, the range of the model updates will gradually shrink, indicating that the quantization level should decrease with the training stages. Based on the theoretical analysis, a descending quantization scheme named FedDQ is proposed. Experimental results show that the proposed descending quantization scheme can save up to 65.2% of the communicated bit volume and up to 68% of the communication rounds, when compared with existing schemes.

5.1SPAug 3, 2021

Neural Calibration for Scalable Beamforming in FDD Massive MIMO with Implicit Channel Estimation

Yifan Ma, Yifei Shen, Xianghao Yu et al.

Channel estimation and beamforming play critical roles in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems. However, these two modules have been treated as two stand-alone components, which makes it difficult to achieve a global system optimality. In this paper, we propose a deep learning-based approach that directly optimizes the beamformers at the base station according to the received uplink pilots, thereby, bypassing the explicit channel estimation. Different from the existing fully data-driven approach where all the modules are replaced by deep neural networks (DNNs), a neural calibration method is proposed to improve the scalability of the end-to-end design. In particular, the backbone of conventional time-efficient algorithms, i.e., the least-squares (LS) channel estimator and the zero-forcing (ZF) beamformer, is preserved and DNNs are leveraged to calibrate their inputs for better performance. The permutation equivariance property of the formulated resource allocation problem is then identified to design a low-complexity neural network architecture. Simulation results will show the superiority of the proposed neural calibration method over benchmark schemes in terms of both the spectral efficiency and scalability in large-scale wireless networks.

22.3LGMar 26, 2021

Hierarchical Federated Learning with Quantization: Convergence Analysis and System Design

Lumin Liu, Jun Zhang, Shenghui Song et al.

Federated learning (FL) is a powerful distributed machine learning framework where a server aggregates models trained by different clients without accessing their private data. Hierarchical FL, with a client-edge-cloud aggregation hierarchy, can effectively leverage both the cloud server's access to many clients' data and the edge servers' closeness to the clients to achieve a high communication efficiency. Neural network quantization can further reduce the communication overhead during model uploading. To fully exploit the advantages of hierarchical FL, an accurate convergence analysis with respect to the key system parameters is needed. Unfortunately, existing analysis is loose and does not consider model quantization. In this paper, we derive a tighter convergence bound for hierarchical FL with quantization. The convergence result leads to practical guidelines for important design problems such as the client-edge aggregation and edge-client association strategies. Based on the obtained analytical results, we optimize the two aggregation intervals and show that the client-edge aggregation interval should slowly decay while the edge-cloud aggregation interval needs to adapt to the ratio of the client-edge and edge-cloud propagation delay. Simulation results shall verify the design guidelines and demonstrate the effectiveness of the proposed aggregation strategy.

36.0NIMay 16, 2019

Client-Edge-Cloud Hierarchical Federated Learning

Lumin Liu, Jun Zhang, S. H. Song et al.

Federated Learning is a collaborative machine learning framework to train a deep learning model without accessing clients' private data. Previous works assume one central parameter server either at the cloud or at the edge. The cloud server can access more data but with excessive communication overhead and long latency, while the edge server enjoys more efficient communications with the clients. To combine their advantages, we propose a client-edge-cloud hierarchical Federated Learning system, supported with a HierFAVG algorithm that allows multiple edge servers to perform partial model aggregation. In this way, the model can be trained faster and better communication-computation trade-offs can be achieved. Convergence analysis is provided for HierFAVG and the effects of key parameters are also investigated, which lead to qualitative design guidelines. Empirical experiments verify the analysis and demonstrate the benefits of this hierarchical architecture in different data distribution scenarios. Particularly, it is shown that by introducing the intermediate edge servers, the model training time and the energy consumption of the end devices can be simultaneously reduced compared to cloud-based Federated Learning.

5.1NIMay 15, 2019

Connectivity-Aware UAV Path Planning with Aerial Coverage Maps

Hongyu Yang, Jun Zhang, S. H. Song et al.

Cellular networks are promising to support effective wireless communications for unmanned aerial vehicles (UAVs), which will help to enable various long-range UAV applications. However, these networks are optimized for terrestrial users, and thus do not guarantee seamless aerial coverage. In this paper, we propose to overcome this difficulty by exploiting controllable mobility of UAVs, and investigate connectivity-aware UAV path planning. To explicitly impose communication requirements on UAV path planning, we introduce two new metrics to quantify the cellular connectivity quality of a UAV path. Moreover, aerial coverage maps are used to provide accurate locations of scattered coverage holes in the complicated propagation environment. We formulate the UAV path planning problem as finding the shortest path subject to connectivity constraints. Based on graph search methods, a novel connectivity-aware path planning algorithm with low complexity is proposed. The effectiveness and superiority of our proposed algorithm are demonstrated using the aerial coverage map of an urban section in Virginia, which is built by ray tracing. Simulation results also illustrate a tradeoff between the path length and connectivity quality of UAVs.