Guangxu Zhu

IT
h-index19
47papers
2,226citations
Novelty49%
AI Score58

47 Papers

ITJun 3
Bounded Deep Unfolding for Joint Beamforming and Scheduling in Multi-Cell MIMO Networks

Jiansheng Li, Shuqi Chai, Fan Xu et al.

This paper investigates the joint resource block group (RBG) scheduling and beamforming optimization problem for weighted sum-rate (WSR) maximization in multi-cell multiple-input multiple-output (MIMO) downlink networks. While the Fast Fractional Programming (FastFP) framework provides a reliable model-driven solution, it suffers from conservative continuous beamforming updates and prohibitive computational overhead during the discrete RBG matching phase. To address these bottlenecks, we propose a joint deep unfolding framework comprising two core modules: P-Net and K-Net. For continuous beamforming, P-Net learns an adaptive relaxation factor along the analytical FastFP update direction. By strictly constraining this factor within an ascent-preserving interval, P-Net accelerates the optimization trajectory while rigorously retaining monotonic improvement and stationary-point convergence guarantees. For discrete RBG scheduling, K-Net learns a long-horizon priority policy that guides a low-complexity greedy assignment, effectively preserving the assignment quality while bypassing the high complexity of Hungarian matching. Both networks leverage analytical algorithmic priors and utilize recurrent parameter sharing, enabling flexible inference beyond the training horizon. Extensive simulations demonstrate that the proposed joint framework achieves higher WSR and faster execution times than conventional model-driven baselines, while generalizing robustly across unseen network scales, antenna configurations, and channel conditions without retraining.

ITJul 3, 2022
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI

Dingzhu Wen, Peixi Liu, Guangxu Zhu et al.

This paper studies a new multi-device edge artificial-intelligent (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) to enable low-latency intelligent services at the network edge. In this system, multiple ISAC devices perform radar sensing to obtain multi-view data, and then offload the quantized version of extracted features to a centralized edge server, which conducts model inference based on the cascaded feature vectors. Under this setup and by considering classification tasks, we measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain, which is defined as the distance of two classes in the Euclidean feature space under normalized covariance. To maximize the discriminant gain, we first quantify the influence of the sensing, computation, and communication processes on it with a derived closed-form expression. Then, an end-to-end task-oriented resource management approach is developed by integrating the three processes into a joint design. This integrated sensing, computation, and communication (ISCC) design approach, however, leads to a challenging non-convex optimization problem, due to the complicated form of discriminant gain and the device heterogeneity in terms of channel gain, quantization level, and generated feature subsets. Remarkably, the considered non-convex problem can be optimally solved based on the sum-of-ratios method. This gives the optimal ISCC scheme, that jointly determines the transmit power and time allocation at multiple devices for sensing and communication, as well as their quantization bits allocation for computation distortion control. By using human motions recognition as a concrete AI inference task, extensive experiments are conducted to verify the performance of our derived optimal ISCC scheme.

ROJun 28, 2023
Communication Resources Constrained Hierarchical Federated Learning for End-to-End Autonomous Driving

Wei-Bin Kou, Shuai Wang, Guangxu Zhu et al.

While federated learning (FL) improves the generalization of end-to-end autonomous driving by model aggregation, the conventional single-hop FL (SFL) suffers from slow convergence rate due to long-range communications among vehicles and cloud server. Hierarchical federated learning (HFL) overcomes such drawbacks via introduction of mid-point edge servers. However, the orchestration between constrained communication resources and HFL performance becomes an urgent problem. This paper proposes an optimization-based Communication Resource Constrained Hierarchical Federated Learning (CRCHFL) framework to minimize the generalization error of the autonomous driving model using hybrid data and model aggregation. The effectiveness of the proposed CRCHFL is evaluated in the Car Learning to Act (CARLA) simulation platform. Results show that the proposed CRCHFL both accelerates the convergence rate and enhances the generalization of federated learning autonomous driving model. Moreover, under the same communication resource budget, it outperforms the HFL by 10.33% and the SFL by 12.44%.

MAJun 1
QoEReasoner: An Agentic Reasoning Framework for Automated and Explainable QoE Diagnosis in RANs

Qizhe Li, Haolong Chen, Shan Dai et al.

Diagnosing Quality-of-Experience (QoE) degradations in operational Radio Access Networks (RANs) is a critical but notoriously complex task, traditionally requiring labor-intensive expert analysis over high-dimensional, cross-layer telemetry. While Large Language Models (LLMs) offer unprecedented reasoning capabilities, they are fundamentally unsuited for raw RANs troubleshooting: they fail at numeric time-series analysis, hallucinate protocol-violating causal links, and lack the stateful rigor required for multi-step fault localization. To bridge this gap, we present QoEReasoner, an end-to-end, LLM-driven agentic system designed for automated and explainable QoE diagnosis. QoEReasoner tames the inherent unpredictability of LLMs by grounding their reasoning in the physical realities of the network. It employs deterministic tools to reliably translate raw numeric KPIs into structured evidence, enforces protocol-consistent fault propagation through a domain-specific Knowledge Base, and leverages a Historical Bank of expert-validated cases to guide hypothesis generation. A stateful central planner orchestrates this closed-loop process across anomaly detection, causal tracing, and root-cause localization. Evaluations on real-world operational RANs datasets demonstrate that QoEReasoner outperforms strong baselines by 18\%-40\% in accuracy across multiple diagnostic tasks. Furthermore, it reduces diagnostic time from approximately 30 minutes of manual expert analysis to just 3 minutes per session, delivering highly interpretable, expert-grade reports while remaining robust across diverse LLM backbones.

CVAug 20, 2024Code
CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network

Zijian Zhao, Tingwei Chen, Zhijie Cai et al.

In recent years, Wi-Fi sensing has garnered significant attention due to its numerous benefits, such as privacy protection, low cost, and penetration ability. Extensive research has been conducted in this field, focusing on areas such as gesture recognition, people identification, and fall detection. However, many data-driven methods encounter challenges related to domain shift, where the model fails to perform well in environments different from the training data. One major factor contributing to this issue is the limited availability of Wi-Fi sensing datasets, which makes models learn excessive irrelevant information and over-fit to the training set. Unfortunately, collecting large-scale Wi-Fi sensing datasets across diverse scenarios is a challenging task. To address this problem, we propose CrossFi, a siamese network-based approach that excels in both in-domain scenario and cross-domain scenario, including few-shot, zero-shot scenarios, and even works in few-shot new-class scenario where testing set contains new categories. The core component of CrossFi is a sample-similarity calculation network called CSi-Net, which improves the structure of the siamese network by using an attention mechanism to capture similarity information, instead of simply calculating the distance or cosine similarity. Based on it, we develop an extra Weight-Net that can generate a template for each class, so that our CrossFi can work in different scenarios. Experimental results demonstrate that our CrossFi achieves state-of-the-art performance across various scenarios. In gesture recognition task, our CrossFi achieves an accuracy of 98.17% in in-domain scenario, 91.72% in one-shot cross-domain scenario, 64.81% in zero-shot cross-domain scenario, and 84.75% in one-shot new-class scenario. The code for our model is publicly available at https://github.com/RS2002/CrossFi.

ITNov 2, 2022
Task-Oriented Over-the-Air Computation for Multi-Device Edge AI

Dingzhu Wen, Xiang Jiao, Peixi Liu et al.

Departing from the classic paradigm of data-centric designs, the 6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task. Targeting end-to-end system performance, such techniques are sophisticated as they aim to seamlessly integrate sensing (data acquisition), communication (data transmission), and computation (data processing). Aligned with the paradigm shift, a task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system. In the considered system, local feature vectors, which are extracted from the real-time noisy sensory data on devices, are aggregated over-the-air by exploiting the waveform superposition in a multiuser channel. Then the aggregated features as received at a server are fed into an inference model with the result used for decision making or control of actuators. To design inference-oriented AirComp, the transmit precoders at edge devices and receive beamforming at edge server are jointly optimized to rein in the aggregation error and maximize the inference accuracy. The problem is made tractable by measuring the inference accuracy using a surrogate metric called discriminant gain, which measures the discernibility of two object classes in the application of object/event classification. It is discovered that the conventional AirComp beamforming design for minimizing the mean square error in generic AirComp with respect to the noiseless case may not lead to the optimal classification accuracy. The reason is due to the overlooking of the fact that feature dimensions have different sensitivity towards aggregation errors and are thus of different importance levels for classification. This issue is addressed in this work via a new task-oriented AirComp scheme designed by directly maximizing the derived discriminant gain.

ITJun 11, 2023
Task-Oriented Integrated Sensing, Computation and Communication for Wireless Edge AI

Hong Xing, Guangxu Zhu, Dongzhu Liu et al.

With the advent of emerging IoT applications such as autonomous driving, digital-twin and metaverse etc. featuring massive data sensing, analyzing and inference as well critical latency in beyond 5G (B5G) networks, edge artificial intelligence (AI) has been proposed to provide high-performance computation of a conventional cloud down to the network edge. Recently, convergence of wireless sensing, computation and communication (SC${}^2$) for specific edge AI tasks, has aroused paradigm shift by enabling (partial) sharing of the radio-frequency (RF) transceivers and information processing pipelines among these three fundamental functionalities of IoT. However, most existing design frameworks separate these designs incurring unnecessary signaling overhead and waste of energy, and it is therefore of paramount importance to advance fully integrated sensing, computation and communication (ISCC) to achieve ultra-reliable and low-latency edge intelligence acquisition. In this article, we provide an overview of principles of enabling ISCC technologies followed by two concrete use cases of edge AI tasks demonstrating the advantage of task-oriented ISCC, and pointed out some practical challenges in edge AI design with advanced ISCC solutions.

ITJun 5, 2023
Integrated Sensing, Computation, and Communication for UAV-assisted Federated Edge Learning

Yao Tang, Guangxu Zhu, Wei Xu et al.

Federated edge learning (FEEL) enables privacy-preserving model training through periodic communication between edge devices and the server. Unmanned Aerial Vehicle (UAV)-mounted edge devices are particularly advantageous for FEEL due to their flexibility and mobility in efficient data collection. In UAV-assisted FEEL, sensing, computation, and communication are coupled and compete for limited onboard resources, and UAV deployment also affects sensing and communication performance. Therefore, the joint design of UAV deployment and resource allocation is crucial to achieving the optimal training performance. In this paper, we address the problem of joint UAV deployment design and resource allocation for FEEL via a concrete case study of human motion recognition based on wireless sensing. We first analyze the impact of UAV deployment on the sensing quality and identify a threshold value for the sensing elevation angle that guarantees a satisfactory quality of data samples. Due to the non-ideal sensing channels, we consider the probabilistic sensing model, where the successful sensing probability of each UAV is determined by its position. Then, we derive the upper bound of the FEEL training loss as a function of the sensing probability. Theoretical results suggest that the convergence rate can be improved if UAVs have a uniform successful sensing probability. Based on this analysis, we formulate a training time minimization problem by jointly optimizing UAV deployment, integrated sensing, computation, and communication (ISCC) resources under a desirable optimality gap constraint. To solve this challenging mixed-integer non-convex problem, we apply the alternating optimization technique, and propose the bandwidth, batch size, and position optimization (BBPO) scheme to optimize these three decision variables alternately.

ITAug 7, 2022
Low-Latency Cooperative Spectrum Sensing via Truncated Vertical Federated Learning

Zezhong Zhang, Guangxu Zhu, Shuguang Cui

In recent years, the exponential increase in the demand of wireless data transmission rises the urgency for accurate spectrum sensing approaches to improve spectrum efficiency. The unreliability of conventional spectrum sensing methods by using measurements from a single secondary user (SU) has motivated research on cooperative spectrum sensing (CSS). In this work, we propose a vertical federated learning (VFL) framework to exploit the distributed features across multiple SUs without compromising data privacy. However, the repetitive training process in VFL faces the issue of high communication latency. To accelerate the training process, we propose a truncated vertical federated learning (T-VFL) algorithm, where the training latency is highly reduced by integrating the standard VFL algorithm with a channel-aware user scheduling policy. The convergence performance of T-VFL is provided via mathematical analysis and justified by simulation results. Moreover, to guarantee the convergence performance of the T-VFL algorithm, we conclude three design rules on the neural architectures used under the VFL framework, whose effectiveness is proved through simulations.

ROApr 2Code
Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast-Slow Planning

Jiayi Chen, Shuai Wang, Guangxu Zhu et al.

Large foundation models enable powerful reasoning for autonomous systems, but mapping semantic intent to reliable real-time control remains challenging. Existing approaches either (i) let Large Language Models (LLMs) generate trajectories directly - brittle, hard to verify, and latency-prone - or (ii) adjust Model Predictive Control (MPC) objectives online - mixing slow deliberation with fast control and blurring interfaces. We propose Agentic Fast-Slow Planning, a hierarchical framework that decouples perception, reasoning, planning, and control across natural timescales. The framework contains two bridges. Perception2Decision compresses scenes into ego-centric topologies using an on-vehicle Vision-Language Model (VLM) detector, then maps them to symbolic driving directives in the cloud with an LLM decision maker - reducing bandwidth and delay while preserving interpretability. Decision2Trajectory converts directives into executable paths: Semantic-Guided A* embeds language-derived soft costs into classical search to bias solutions toward feasible trajectories, while an Agentic Refinement Module adapts planner hyperparameters using feedback and memory. Finally, MPC tracks the trajectories in real time, with optional cloud-guided references for difficult cases. Experiments in CARLA show that Agentic Fast-Slow Planning improves robustness under perturbations, reducing lateral deviation by up to 45% and completion time by over 12% compared to pure MPC and an A*-guided MPC baseline. Code is available at https://github.com/cjychenjiayi/icra2026_AFSP.

ITJul 1, 2024
Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

Xiang Jiao, Dingzhu Wen, Guangxu Zhu et al.

Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme.

LGSep 29, 2024
Fast-Convergent and Communication-Alleviated Heterogeneous Hierarchical Federated Learning in Autonomous Driving

Wei-Bin Kou, Qingfeng Lin, Ming Tang et al.

Street Scene Semantic Understanding (denoted as TriSU) is a complex task for autonomous driving (AD). However, inference model trained from data in a particular geographical region faces poor generalization when applied in other regions due to inter-city data domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization by collaborative privacy-preserving training over distributed datasets from different cities. Unfortunately, it suffers from slow convergence because data from different cities are with disparate statistical properties. Going beyond existing HFL methods, we propose a Gaussian heterogeneous HFL algorithm (FedGau) to address inter-city data heterogeneity so that convergence can be accelerated. In the proposed FedGau algorithm, both single RGB image and RGB dataset are modelled as Gaussian distributions for aggregation weight design. This approach not only differentiates each RGB image by respective statistical distribution, but also exploits the statistics of dataset from each city in addition to the conventionally considered data volume. With the proposed approach, the convergence is accelerated by 35.5\%-40.6\% compared to existing state-of-the-art (SOTA) HFL methods. On the other hand, to reduce the involved communication resource, we further introduce a novel performance-aware adaptive resource scheduling (AdapRS) policy. Unlike the traditional static resource scheduling policy that exchanges a fixed number of models between two adjacent aggregations, AdapRS adjusts the number of model aggregation at different levels of HFL so that unnecessary communications are minimized. Extensive experiments demonstrate that AdapRS saves 29.65\% communication overhead compared to conventional static resource scheduling policy while maintaining almost the same performance.

AISep 6, 2024
An overview of domain-specific foundation model: key technologies, applications and challenges

Haolong Chen, Hanzhi Chen, Zijian Zhao et al.

The impressive performance of ChatGPT and other foundation-model-based products in human language understanding has prompted both academia and industry to explore how these models can be tailored for specific industries and application scenarios. This process, known as the customization of domain-specific foundation models (FMs), addresses the limitations of general-purpose models, which may not fully capture the unique patterns and requirements of domain-specific data. Despite its importance, there is a notable lack of comprehensive overview papers on building domain-specific FMs, while numerous resources exist for general-purpose models. To bridge this gap, this article provides a timely and thorough overview of the methodology for customizing domain-specific FMs. It introduces basic concepts, outlines the general architecture, and surveys key methods for constructing domain-specific models. Furthermore, the article discusses various domains that can benefit from these specialized models and highlights the challenges ahead. Through this overview, we aim to offer valuable guidance and reference for researchers and practitioners from diverse fields to develop their own customized FMs.

LGMay 19
OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization

Wei-Bin Kou, Guangxu Zhu, Ming Tang et al.

The global deployment of edge intelligence operates across heterogeneous legal frameworks. While some regions permit centralized learning (CL) via cloud data aggregation, others enforce strict data localization, necessitating federated learning (FL). This operational dichotomy introduces two incompatible optimization regimes (i.e., unbiased global gradients yet coupled with internal covariate shift in CL versus biased, drift-prone local updates in FL), resulting in that any naive integration of the two lacks rigorous theoretical guarantees. To fill this gap, we propose OmniISR, a unified framework that fuses pure CL, pure FL, and hybrid CL-FL training modes via equipping intermediate supervision and regularization (ISR) signals at multiple hidden layers. Specifically, we propose (i) to use mutual-information (MI) as intermediate supervision to align shifting internal covariate in CL and client-drifting representations in FL, and (ii) to adopt negative-entropy (NE) as intermediate regularizer to penalize overconfident prediction, preserve representational uncertainty, and avoid device-specific collapse. On the theory side, we derive (i) a unified, ISR-agnostic, and non-asymptotic O(1/sqrt(T)) convergence bound that shows the introduced ISR does not violate standard SGD convergence, (ii) a federated drift-bound that quantifies the ISR-reduced client drift, (iii) a gradient-alignment guarantee that ensures non-conflicting CL and FL updates under mild bias, and (iv) an explicit escape-time bound that indicates that CL-FL hybrid mixing enlarges effective stochasticity and accelerates escape from strict saddles. Extensive experiments demonstrate that OmniISR consistently improves model performance in both centralized and federated paradigms, reduces the CL-FL gap by 22.60%, and yields 37/48 paired metric wins across multiple FL algorithms.

LGNov 13, 2025
DK-Root: A Joint Data-and-Knowledge-Driven Framework for Root Cause Analysis of QoE Degradations in Mobile Networks

Qizhe Li, Haolong Chen, Jiansheng Li et al.

Diagnosing the root causes of Quality of Experience (QoE) degradations in operational mobile networks is challenging due to complex cross-layer interactions among kernel performance indicators (KPIs) and the scarcity of reliable expert annotations. Although rule-based heuristics can generate labels at scale, they are noisy and coarse-grained, limiting the accuracy of purely data-driven approaches. To address this, we propose DK-Root, a joint data-and-knowledge-driven framework that unifies scalable weak supervision with precise expert guidance for robust root-cause analysis. DK-Root first pretrains an encoder via contrastive representation learning using abundant rule-based labels while explicitly denoising their noise through a supervised contrastive objective. To supply task-faithful data augmentation, we introduce a class-conditional diffusion model that generates KPIs sequences preserving root-cause semantics, and by controlling reverse diffusion steps, it produces weak and strong augmentations that improve intra-class compactness and inter-class separability. Finally, the encoder and the lightweight classifier are jointly fine-tuned with scarce expert-verified labels to sharpen decision boundaries. Extensive experiments on a real-world, operator-grade dataset demonstrate state-of-the-art accuracy, with DK-Root surpassing traditional ML and recent semi-supervised time-series methods. Ablations confirm the necessity of the conditional diffusion augmentation and the pretrain-finetune design, validating both representation quality and classification gains.

LGDec 9, 2024Code
CSI-BERT2: A BERT-inspired Framework for Efficient CSI Prediction and Classification in Wireless Communication and Sensing

Zijian Zhao, Fanyi Meng, Zhonghao Lyu et al.

Channel state information (CSI) is a fundamental component in both wireless communication and sensing systems, enabling critical functions such as radio resource optimization and environmental perception. In wireless sensing, data scarcity and packet loss hinder efficient model training, while in wireless communication, high-dimensional CSI matrices and short coherent times caused by high mobility present challenges in CSI estimation.To address these issues, we propose a unified framework named CSI-BERT2 for CSI prediction and classification tasks, built on CSI-BERT, which adapts BERT to capture the complex relationships among CSI sequences through a bidirectional self-attention mechanism. We introduce a two-stage training method that first uses a mask language model (MLM) to enable the model to learn general feature extraction from scarce datasets in an unsupervised manner, followed by fine-tuning for specific downstream tasks. Specifically, we extend MLM into a mask prediction model (MPM), which efficiently addresses the CSI prediction task. To further enhance the representation capacity of CSI data, we modify the structure of the original CSI-BERT. We introduce an adaptive re-weighting layer (ARL) to enhance subcarrier representation and a multi-layer perceptron (MLP)-based temporal embedding module to mitigate temporal information loss problem inherent in the original Transformer.Extensive experiments on both real-world collected and simulated datasets demonstrate that CSI-BERT2 achieves state-of-the-art performance across all tasks. Our results further show that CSI-BERT2 generalizes effectively across varying sampling rates and robustly handles discontinuous CSI sequences caused by packet loss-challenges that conventional methods fail to address. The dataset and code are publicly available at https://github.com/RS2002/CSI-BERT2 .

CVDec 6, 2024Code
LoFi: Vision-Aided Label Generator for Wi-Fi Localization and Tracking

Zijian Zhao, Tingwei Chen, Fanyi Meng et al.

Data-driven Wi-Fi localization and tracking have shown great promise due to their lower reliance on specialized hardware compared to model-based methods. However, most existing data collection techniques provide only coarse-grained ground truth or a limited number of labeled points, significantly hindering the advancement of data-driven approaches. While systems like lidar can deliver precise ground truth, their high costs make them inaccessible to many users. To address these challenges, we propose LoFi, a vision-aided label generator for Wi-Fi localization and tracking. LoFi can generate ground truth position coordinates solely from 2D images, offering high precision, low cost, and ease of use. Utilizing our method, we have compiled a Wi-Fi tracking and localization dataset using the ESP32-S3 and a webcam. The code and dataset of this paper are available at https://github.com/RS2002/LoFi.

DCApr 14
Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization

Zhijie Cai, Yuhao Zheng, Haolong Chen et al.

Federated Learning (FL) offers a promising pathway for collaboratively fine-tuning Large Language Models (LLMs) at the edge; however, this paradigm faces a critical bottleneck: the prohibitive communication and memory overheads incurred by exchanging high-dimensional gradients. Furthermore, recent studies reveal that user training data can still be recovered from these local gradients, undermining the core privacy promise of FL. In this paper, we address this trilemma of communication, memory, and privacy by proposing pAirZero, a novel framework that synergizes Zeroth-Order (ZO) optimization with Over-the-Air (OTA) computation. Uniquely, pAirZero enables resource-constrained devices to submit their local gradient with only bit-level communication loads while participating in federated fine-tuning of LLMs with inference-level memory costs. This approach not only eliminates the high memory requirements needed for LLM fine-tuning but also alleviates the strict synchronization requirements that plague conventional OTA methods. We further formulate a rigorous optimization model to adaptively determine the optimal transmit power and noise levels, ensuring consistent privacy protection regardless of channel conditions. Numerical experiments demonstrate the superiority of pAirZero in enabling secure, efficient LLM fine-tuning over wireless networks, with only 25% peak memory cost on OPT-125M and communication load orders of magnitude lower than conventional methods.

RODec 7, 2025
FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving

Wei-Bin Kou, Guangxu Zhu, Bingyang Cheng et al.

Federated Learning (FL) enables collaborative training of autonomous driving (AD) models across distributed vehicles while preserving data privacy. However, FL encounters critical challenges such as poor generalization and slow convergence due to non-independent and identically distributed (non-IID) data from diverse driving environments. To overcome these obstacles, we introduce Federated Deep Supervision and Regularization (FedDSR), a paradigm that incorporates multi-access intermediate layer supervision and regularization within federated AD system. Specifically, FedDSR comprises following integral strategies: (I) to select multiple intermediate layers based on predefined architecture-agnostic standards. (II) to compute mutual information (MI) and negative entropy (NE) on those selected layers to serve as intermediate loss and regularizer. These terms are integrated into the output-layer loss to form a unified optimization objective, enabling comprehensive optimization across the network hierarchy. (III) to aggregate models from vehicles trained based on aforementioned rules of (I) and (II) to generate the global model on central server. By guiding and penalizing the learning of feature representations at intermediate stages, FedDSR enhances the model generalization and accelerates model convergence for federated AD. We then take the semantic segmentation task as an example to assess FedDSR and apply FedDSR to multiple model architectures and FL algorithms. Extensive experiments demonstrate that FedDSR achieves up to 8.93% improvement in mIoU and 28.57% reduction in training rounds, compared to other FL baselines, making it highly suitable for practical deployment in federated AD ecosystems.

SPJun 13, 2024Code
Modelling the 5G Energy Consumption using Real-world Data: Energy Fingerprint is All You Need

Tingwei Chen, Yantao Wang, Hanzhi Chen et al.

The introduction of 5G technology has revolutionized communications, enabling unprecedented capacity, connectivity, and ultra-fast, reliable communications. However, this leap has led to a substantial increase in energy consumption, presenting a critical challenge for network sustainability. Accurate energy consumption modeling is essential for developing energy-efficient strategies, enabling operators to optimize resource utilization while maintaining network performance. To address this, we propose a novel deep learning model for 5G base station energy consumption estimation based on a real-world dataset. Unlike existing methods, our approach integrates the Base Station Identifier (BSID) as an input feature through an embedding layer, capturing unique energy patterns across different base stations. We further introduce a masked training method and an attention mechanism to enhance generalization and accuracy. Experimental results show significant improvements, reducing Mean Absolute Percentage Error (MAPE) from 12.75% to 4.98%, achieving over 60% performance gain compared to existing models. The source code for our model is available at https://github.com/RS2002/ARL.

CVDec 6, 2024Code
KNN-MMD: Cross Domain Wireless Sensing via Local Distribution Alignment

Zijian Zhao, Zhijie Cai, Tingwei Chen et al.

Wireless sensing has recently found widespread applications in diverse environments, including homes, offices, and public spaces. By analyzing patterns in channel state information (CSI), it is possible to infer human actions for tasks such as person identification, gesture recognition, and fall detection. However, CSI is highly sensitive to environmental changes, where even minor alterations can significantly distort the CSI patterns. This sensitivity often leads to performance degradation or outright failure when applying wireless sensing models trained in one environment to another. To address this challenge, Domain Alignment (DAL) has been widely adopted for cross-domain classification tasks, as it focuses on aligning the global distributions of the source and target domains in feature space. Despite its popularity, DAL often neglects inter-category relationships, which can lead to misalignment between categories across domains, even when global alignment is achieved. To overcome these limitations, we propose K-Nearest Neighbors Maximum Mean Discrepancy (KNN-MMD), a novel few-shot method for cross-domain wireless sensing. Our approach begins by constructing a help set using KNN from the target domain, enabling local alignment between the source and target domains within each category using MMD. Additionally, we address a key instability issue commonly observed in cross-domain methods, where model performance fluctuates sharply between epochs. Further, most existing methods struggle to determine an optimal stopping point during training due to the absence of labeled data from the target domain. Our method resolves this by excluding the support set from the target domain during training and employing it as a validation set to determine the stopping criterion.The dataset and code are publicly available at https://github.com/RS2002/KNN-MMD .

LGMar 19, 2024Code
Finding the Missing Data: A BERT-inspired Approach Against Package Loss in Wireless Sensing

Zijian Zhao, Tingwei Chen, Fanyi Meng et al.

Despite the development of various deep learning methods for Wi-Fi sensing, package loss often results in noncontinuous estimation of the Channel State Information (CSI), which negatively impacts the performance of the learning models. To overcome this challenge, we propose a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) for CSI recovery, named CSI-BERT. CSI-BERT can be trained in an self-supervised manner on the target dataset without the need for additional data. Furthermore, unlike traditional interpolation methods that focus on one subcarrier at a time, CSI-BERT captures the sequential relationships across different subcarriers. Experimental results demonstrate that CSI-BERT achieves lower error rates and faster speed compared to traditional interpolation methods, even when facing with high loss rates. Moreover, by harnessing the recovered CSI obtained from CSI-BERT, other deep learning models like Residual Network and Recurrent Neural Network can achieve an average increase in accuracy of approximately 15\% in Wi-Fi sensing tasks. The collected dataset WiGesture and code for our model are publicly available at https://github.com/RS2002/CSI-BERT.

ITMay 4
Modal-Based Multi-Scatterer Channel Model for Localized Radiomap Extrapolation

Wenli Li, Bin Wang, Guangxu Zhu et al.

A radiomap, representing the spatial distribution of wireless signal strength within a specific region, is fundamentally determined by the local propagation channel and finds extensive applications in network planning and optimization. The channel model is inherently linked to electromagnetic (EM) wave propagation, and the advent of high-frequency communications presents a new picture - microscopic (and thus negligible) scatterers in lower frequency bands become mesoscopic, rendering non-negligible EM effects. In this paper, we establish a channel model for multiple scatterers based on a spherical wave mode expansion. The source radiation, single scatterer response and multiple scatterer interactions are formed in the superposition of spherical-wave modes, capturing the multi-path effect in wave perspective. Iterative methods are used to handle the massive coupling between scatterers. This forward model is converted to an inverse optimization problem, where the scattering responses and the scatterer locations are jointly learned from sparse field measurements. A simplified approximate model is then introduced, employing fewer and simpler low-order modes while still allowing a larger number of more densely placed scatterers. Simulation results demonstrate that the proposed model accurately reconstructs and extrapolates radiomaps in both the spatial domain and the beam domain. Overall, the proposed framework offers a physically interpretable approach to localized propagation modeling.

LGMay 1
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

Zhijie Cai, Haolong Chen, Guangxu Zhu

Fine-tuning LLMs is necessary for various dedicated downstream tasks, but classic backpropagation-based fine-tuning methods require substantial GPU memory. To this end, a recent work, MeZO, which relies solely on forward passes to fine-tune LLMs, significantly reduces GPU requirements at the cost of slower convergence due to its indifference to loss landscapes. Standard solutions, such as Adam, explore loss landscapes by estimating the first- and second-order moments and storing them in memory to guide the model's movement through dimensions with lower curvature and vice versa. However, directly applying Adam negates MeZO's advantage as it will triple the memory requirement. In light of this, we propose AdaMeZO, a zeroth-order optimizer that leverages Adam-style first- and second-moment estimates without maintaining them in memory. We present a theoretical analysis of AdaMeZO, corroborated by extensive experiments demonstrating AdaMeZO's performance, showing that AdaMeZO can outperform MeZO while requiring up to $70\%$ fewer forward passes. Trajectory visualizations affirm AdaMeZO's ability to adapt to diverse loss landscapes.

ITApr 1, 2024
Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

Zhonghao Lyu, Yuchen Li, Guangxu Zhu et al.

In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters such as the number of learning rounds and batch sizes in the two stages on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.

SPMar 25, 2024
RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

Xiaojie Li, Songyang Zhang, Hang Li et al.

Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data, as well as the scarcity of real-world measurements. To address these challenges, our study presents RadioGAT, a novel framework based on Graph Attention Network (GAT) tailored for MB-RMR within a single area, eliminating the need for multi-region datasets. RadioGAT innovatively merges model-based spatial-spectral correlation encoding with data-driven radiomap generalization, thus minimizing the reliance on extensive data sources. The framework begins by transforming sparse multi-band data into a graph structure through an innovative encoding strategy that leverages radio propagation models to capture the spatial-spectral correlation inherent in the data. This graph-based representation not only simplifies data handling but also enables tailored label sampling during training, significantly enhancing the framework's adaptability for deployment. Subsequently, The GAT is employed to generalize the radiomap information across various frequency bands. Extensive experiments using raytracing datasets based on real-world environments have demonstrated RadioGAT's enhanced accuracy in supervised learning settings and its robustness in semi-supervised scenarios. These results underscore RadioGAT's effectiveness and practicality for MB-RMR in environments with limited data availability.

ITApr 9, 2024
Collaborative Edge AI Inference over Cloud-RAN

Pengfei Zhang, Dingzhu Wen, Guangxu Zhu et al.

In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregation, we allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique. Thereafter, these aggregated feature vectors are quantized and transmitted to a central processor (CP) for further aggregation and downstream inference tasks. Our aim in this work is to maximize the inference accuracy via a surrogate accuracy metric called discriminant gain, which measures the discernibility of different classes in the feature space. The key challenges lie on simultaneously suppressing the coupled sensing noise, AirComp distortion caused by hostile wireless channels, and the quantization error resulting from the limited capacity of fronthaul links. To address these challenges, this work proposes a joint transmit precoding, receive beamforming, and quantization error control scheme to enhance the inference accuracy. Extensive numerical experiments demonstrate the effectiveness and superiority of our proposed optimization algorithm compared to various baselines.

ITFeb 5, 2024
Fast and Accurate Cooperative Radio Map Estimation Enabled by GAN

Zezhong Zhang, Guangxu Zhu, Junting Chen et al.

In the 6G era, real-time radio resource monitoring and management are urged to support diverse wireless-empowered applications. This calls for fast and accurate estimation on the distribution of the radio resources, which is usually represented by the spatial signal power strength over the geographical environment, known as a radio map. In this paper, we present a cooperative radio map estimation (CRME) approach enabled by the generative adversarial network (GAN), called as GAN-CRME, which features fast and accurate radio map estimation without the transmitters' information. The radio map is inferred by exploiting the interaction between distributed received signal strength (RSS) measurements at mobile users and the geographical map using a deep neural network estimator, resulting in low data-acquisition cost and computational complexity. Moreover, a GAN-based learning algorithm is proposed to boost the inference capability of the deep neural network estimator by exploiting the power of generative AI. Simulation results showcase that the proposed GAN-CRME is even capable of coarse error-correction when the geographical map information is inaccurate.

CVJan 3, 2025
Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

Wei-Bin Kou, Qingfeng Lin, Ming Tang et al.

To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model's fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs' powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs' vast and diverse training data. On the other hand, to mitigate vehicles' computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48\% and converges faster over 6 times, compared to the existing state-of-the-art approach.

ROFeb 5, 2025
Label Anything: An Interpretable, High-Fidelity and Prompt-Free Annotator

Wei-Bin Kou, Guangxu Zhu, Rongguang Ye et al.

Learning-based street scene semantic understanding in autonomous driving (AD) has advanced significantly recently, but the performance of the AD model is heavily dependent on the quantity and quality of the annotated training data. However, traditional manual labeling involves high cost to annotate the vast amount of required data for training robust model. To mitigate this cost of manual labeling, we propose a Label Anything Model (denoted as LAM), serving as an interpretable, high-fidelity, and prompt-free data annotator. Specifically, we firstly incorporate a pretrained Vision Transformer (ViT) to extract the latent features. On top of ViT, we propose a semantic class adapter (SCA) and an optimization-oriented unrolling algorithm (OptOU), both with a quite small number of trainable parameters. SCA is proposed to fuse ViT-extracted features to consolidate the basis of the subsequent automatic annotation. OptOU consists of multiple cascading layers and each layer contains an optimization formulation to align its output with the ground truth as closely as possible, though which OptOU acts as being interpretable rather than learning-based blackbox nature. In addition, training SCA and OptOU requires only a single pre-annotated RGB seed image, owing to their small volume of learnable parameters. Extensive experiments clearly demonstrate that the proposed LAM can generate high-fidelity annotations (almost 100% in mIoU) for multiple real-world datasets (i.e., Camvid, Cityscapes, and Apolloscapes) and CARLA simulation dataset.

CLJan 11, 2025
First Token Probability Guided RAG for Telecom Question Answering

Tingwei Chen, Jiayi Chen, Zijian Zhao et al.

Large Language Models (LLMs) have garnered significant attention for their impressive general-purpose capabilities. For applications requiring intricate domain knowledge, Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into LLMs. However, existing RAG research has not fully addressed the challenges of Multiple Choice Question Answering (MCQA) in telecommunications, particularly in terms of retrieval quality and mitigating hallucinations. To tackle these challenges, we propose a novel first token probability guided RAG framework. This framework leverages confidence scores to optimize key hyperparameters, such as chunk number and chunk window size, while dynamically adjusting the context. Our method starts by retrieving the most relevant chunks and generates a single token as the potential answer. The probabilities of all options are then normalized to serve as confidence scores, which guide the dynamic adjustment of the context. By iteratively optimizing the hyperparameters based on these confidence scores, we can continuously improve RAG performance. We conducted experiments to validate the effectiveness of our framework, demonstrating its potential to enhance accuracy in domain-specific MCQA tasks.

LGFeb 14, 2025
AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence

Zhijie Cai, Xiaowen Cao, Xu Chen et al.

Recent breakthroughs in artificial intelligence (AI), wireless communications, and sensing technologies have accelerated the evolution of edge intelligence. However, conventional systems still grapple with issues such as low communication efficiency, redundant data acquisition, and poor model generalization. To overcome these challenges, we propose an innovative framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication (JSAC). This framework features an AI-driven closed-loop control architecture that jointly optimizes system resources, thereby delivering superior system-level performance. A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters. This insight enables dynamic reduction of the generalization error through AI-driven closed-loop control. Specifically, for sensing control, we introduce an adaptive data collection strategy based on gradient importance sampling, allowing edge devices to autonomously decide when to terminate data acquisition and how to allocate sample weights based on real-time model feedback. For communication control, drawing inspiration from stochastic gradient Langevin dynamics (SGLD), our joint optimization of transmission power and batch size converts channel and data noise into gradient perturbations that help mitigate overfitting. Experimental evaluations demonstrate that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of collected samples by up to 52 percent while significantly improving model generalization -- with up to 58 percent reductions of the final validation loss. It validates that the proposed scheme can harvest the mutual benefit of AI and JSAC systems by incorporating the model itself into the control loop of the system.

LGOct 18, 2024
Personalizing Low-Rank Bayesian Neural Networks Via Federated Learning

Boning Zhang, Dongzhu Liu, Osvaldo Simeone et al.

To support real-world decision-making, it is crucial for models to be well-calibrated, i.e., to assign reliable confidence estimates to their predictions. Uncertainty quantification is particularly important in personalized federated learning (PFL), as participating clients typically have small local datasets, making it difficult to unambiguously determine optimal model parameters. Bayesian PFL (BPFL) methods can potentially enhance calibration, but they often come with considerable computational and memory requirements due to the need to track the variances of all the individual model parameters. Furthermore, different clients may exhibit heterogeneous uncertainty levels owing to varying local dataset sizes and distributions. To address these challenges, we propose LR-BPFL, a novel BPFL method that learns a global deterministic model along with personalized low-rank Bayesian corrections. To tailor the local model to each client's inherent uncertainty level, LR-BPFL incorporates an adaptive rank selection mechanism. We evaluate LR-BPFL across a variety of datasets, demonstrating its advantages in terms of calibration, accuracy, as well as computational and memory requirements.

SPOct 28, 2025
Trajectory Design for UAV-Based Low-Altitude Wireless Networks in Unknown Environments: A Digital Twin-Assisted TD3 Approach

Jihao Luo, Zesong Fei, Xinyi Wang et al.

Unmanned aerial vehicles (UAVs) are emerging as key enablers for low-altitude wireless network (LAWN), particularly when terrestrial networks are unavailable. In such scenarios, the environmental topology is typically unknown; hence, designing efficient and safe UAV trajectories is essential yet challenging. To address this, we propose a digital twin (DT)-assisted training and deployment framework. In this framework, the UAV transmits integrated sensing and communication signals to provide communication services to ground users, while simultaneously collecting echoes that are uploaded to the DT server to progressively construct virtual environments (VEs). These VEs accelerate model training and are continuously updated with real-time UAV sensing data during deployment, supporting decision-making and enhancing flight safety. Based on this framework, we further develop a trajectory design scheme that integrates simulated annealing for efficient user scheduling with the twin-delayed deep deterministic policy gradient algorithm for continuous trajectory design, aiming to minimize mission completion time while ensuring obstacle avoidance. Simulation results demonstrate that the proposed approach achieves faster convergence, higher flight safety, and shorter mission completion time compared with baseline methods, providing a robust and efficient solution for LAWN deployment in unknown environments.

LGAug 17, 2025
STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction

Haolong Chen, Liang Zhang, Zhengyuan Xin et al.

Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex long-term spatio-temporal dependencies efficiently. The long-term spatio-temporal dependency learning brings two new challenges: 1) The long-term temporal sequence includes multiscale information naturally which is hard to extract efficiently; 2) The multiscale temporal information from different nodes is highly correlated and hard to model. To address these challenges, we propose an efficient \textit{\textbf{S}patio-\textbf{T}emporal \textbf{M}ultiscale \textbf{M}amba} (STM2) that includes a multiscale Mamba architecture to capture the multiscale information efficiently and simultaneously, and an adaptive graph causal convolution network to learn the complex multiscale spatio-temporal dependency. STM2 includes hierarchical information aggregation for different-scale information that guarantees their distinguishability. To capture diverse temporal dynamics across all spatial nodes more efficiently, we further propose an enhanced version termed \textit{\textbf{S}patio-\textbf{T}emporal \textbf{M}ixture of \textbf{M}ultiscale \textbf{M}amba} (STM3) that employs a special Mixture-of-Experts architecture, including a more stable routing strategy and a causal contrastive learning strategy to enhance the scale distinguishability. We prove that STM3 has much better routing smoothness and guarantees the pattern disentanglement for each expert successfully. Extensive experiments on real-world benchmarks demonstrate STM2/STM3's superior performance, achieving state-of-the-art results in long-term spatio-temporal time-series prediction.

NIJul 13, 2025
A Disentangled Representation Learning Framework for Low-altitude Network Coverage Prediction

Xiaojie Li, Zhijie Cai, Nan Qi et al.

The expansion of the low-altitude economy has underscored the significance of Low-Altitude Network Coverage (LANC) prediction for designing aerial corridors. While accurate LANC forecasting hinges on the antenna beam patterns of Base Stations (BSs), these patterns are typically proprietary and not readily accessible. Operational parameters of BSs, which inherently contain beam information, offer an opportunity for data-driven low-altitude coverage prediction. However, collecting extensive low-altitude road test data is cost-prohibitive, often yielding only sparse samples per BS. This scarcity results in two primary challenges: imbalanced feature sampling due to limited variability in high-dimensional operational parameters against the backdrop of substantial changes in low-dimensional sampling locations, and diminished generalizability stemming from insufficient data samples. To overcome these obstacles, we introduce a dual strategy comprising expert knowledge-based feature compression and disentangled representation learning. The former reduces feature space complexity by leveraging communications expertise, while the latter enhances model generalizability through the integration of propagation models and distinct subnetworks that capture and aggregate the semantic representations of latent features. Experimental evaluation confirms the efficacy of our framework, yielding a 7% reduction in error compared to the best baseline algorithm. Real-network validations further attest to its reliability, achieving practical prediction accuracy with MAE errors at the 5dB level.

ROMay 1, 2025
FedEMA: Federated Exponential Moving Averaging with Negative Entropy Regularizer in Autonomous Driving

Wei-Bin Kou, Guangxu Zhu, Bingyang Cheng et al.

Street Scene Semantic Understanding (denoted as S3U) is a crucial but complex task for autonomous driving (AD) vehicles. Their inference models typically face poor generalization due to domain-shift. Federated Learning (FL) has emerged as a promising paradigm for enhancing the generalization of AD models through privacy-preserving distributed learning. However, these FL AD models face significant temporal catastrophic forgetting when deployed in dynamically evolving environments, where continuous adaptation causes abrupt erosion of historical knowledge. This paper proposes Federated Exponential Moving Average (FedEMA), a novel framework that addresses this challenge through two integral innovations: (I) Server-side model's historical fitting capability preservation via fusing current FL round's aggregation model and a proposed previous FL round's exponential moving average (EMA) model; (II) Vehicle-side negative entropy regularization to prevent FL models' possible overfitting to EMA-introduced temporal patterns. Above two strategies empower FedEMA a dual-objective optimization that balances model generalization and adaptability. In addition, we conduct theoretical convergence analysis for the proposed FedEMA. Extensive experiments both on Cityscapes dataset and Camvid dataset demonstrate FedEMA's superiority over existing approaches, showing 7.12% higher mean Intersection-over-Union (mIoU).

ROApr 25, 2025
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization

Jiayi Chen, Shuai Wang, Guoliang Li et al.

Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects. Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive. While collaboration between the two offers a promising solution, the key challenge is deciding when and how to engage the large model. To address this issue, this paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models through two key innovations. First, we propose large vision model guided model predictive control (LVM-MPC), which leverages the cloud for LVM perception and decision making. The cloud output serves as a global guidance for a local MPC, thereby forming a closed-loop perception-to-control system. Second, to determine the best timing for large model query and service, we propose collaboration timing optimization (CTO), including object detection confidence thresholding (ODCT) and cloud forward simulation (CFS), to decide when to seek cloud assistance and when to offer cloud service. Extensive experiments show that the proposed OCP outperforms existing methods in terms of both navigation time and success rate.

LGFeb 23, 2022
Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

Chunhui Zhang, Xiaoming Yuan, Qianyun Zhang et al.

Neural networks often encounter various stringent resource constraints while deploying on edge devices. To tackle these problems with less human efforts, automated machine learning becomes popular in finding various neural architectures that fit diverse Artificial Intelligence of Things (AIoT) scenarios. Recently, to prevent the leakage of private information while enable automated machine intelligence, there is an emerging trend to integrate federated learning and neural architecture search (NAS). Although promising as it may seem, the coupling of difficulties from both tenets makes the algorithm development quite challenging. In particular, how to efficiently search the optimal neural architecture directly from massive non-independent and identically distributed (non-IID) data among AIoT devices in a federated manner is a hard nut to crack. In this paper, to tackle this challenge, by leveraging the advances in ProxylessNAS, we propose a Federated Direct Neural Architecture Search (FDNAS) framework that allows for hardware-friendly NAS from non- IID data across devices. To further adapt to both various data distributions and different types of devices with heterogeneous embedded hardware platforms, inspired by meta-learning, a Cluster Federated Direct Neural Architecture Search (CFDNAS) framework is proposed to achieve device-aware NAS, in the sense that each device can learn a tailored deep learning model for its particular data distribution and hardware constraint. Extensive experiments on non-IID datasets have shown the state-of-the-art accuracy-efficiency trade-offs achieved by the proposed solution in the presence of both data and device heterogeneity.

SPJan 21, 2022
Vertical Federated Edge Learning with Distributed Integrated Sensing and Communication

Peixi Liu, Guangxu Zhu, Wei Jiang et al.

This letter studies a vertical federated edge learning (FEEL) system for collaborative objects/human motion recognition by exploiting the distributed integrated sensing and communication (ISAC). In this system, distributed edge devices first send wireless signals to sense targeted objects/human, and then exchange intermediate computed vectors (instead of raw sensing data) for collaborative recognition while preserving data privacy. To boost the spectrum and hardware utilization efficiency for FEEL, we exploit ISAC for both target sensing and data exchange, by employing dedicated frequency-modulated continuous-wave (FMCW) signals at each edge device. Under this setup, we propose a vertical FEEL framework for realizing the recognition based on the collected multi-view wireless sensing data. In this framework, each edge device owns an individual local L-model to transform its sensing data into an intermediate vector with relatively low dimensions, which is then transmitted to a coordinating edge device for final output via a common downstream S-model. By considering a human motion recognition task, experimental results show that our vertical FEEL based approach achieves recognition accuracy up to 98\% with an improvement up to 8\% compared to the benchmarks, including on-device training and horizontal FEEL.

NIJul 24, 2021
Accelerating Federated Edge Learning via Optimized Probabilistic Device Scheduling

Maojun Zhang, Guangxu Zhu, Shuai Wang et al.

The popular federated edge learning (FEEL) framework allows privacy-preserving collaborative model training via frequent learning-updates exchange between edge devices and server. Due to the constrained bandwidth, only a subset of devices can upload their updates at each communication round. This has led to an active research area in FEEL studying the optimal device scheduling policy for minimizing communication time. However, owing to the difficulty in quantifying the exact communication time, prior work in this area can only tackle the problem partially by considering either the communication rounds or per-round latency, while the total communication time is determined by both metrics. To close this gap, we make the first attempt in this paper to formulate and solve the communication time minimization problem. We first derive a tight bound to approximate the communication time through cross-disciplinary effort involving both learning theory for convergence analysis and communication theory for per-round latency analysis. Building on the analytical result, an optimized probabilistic scheduling policy is derived in closed-form by solving the approximate communication time minimization problem. It is found that the optimized policy gradually turns its priority from suppressing the remaining communication rounds to reducing per-round latency as the training process evolves. The effectiveness of the proposed scheme is demonstrated via a use case on collaborative 3D objective detection in autonomous driving.

SPJul 20, 2021
Accelerating Edge Intelligence via Integrated Sensing and Communication

Tong Zhang, Shuai Wang, Guoliang Li et al.

Realizing edge intelligence consists of sensing, communication, training, and inference stages. Conventionally, the sensing and communication stages are executed sequentially, which results in excessive amount of dataset generation and uploading time. This paper proposes to accelerate edge intelligence via integrated sensing and communication (ISAC). As such, the sensing and communication stages are merged so as to make the best use of the wireless signals for the dual purpose of dataset generation and uploading. However, ISAC also introduces additional interference between sensing and communication functionalities. To address this challenge, this paper proposes a classification error minimization formulation to design the ISAC beamforming and time allocation. The globally optimal solution is derived via the rank-1 guaranteed semidefinite relaxation, and performance analysis is performed to quantify the ISAC gain over that of conventional edge intelligence. Simulation results are provided to verify the effectiveness of the proposed ISAC-assisted edge intelligence system. Interestingly, we find that ISAC is always beneficial, when the duration of generating a sample is more than the duration of uploading a sample. Otherwise, the ISAC gain can vanish or even be negative. Nevertheless, we still derive a sufficient condition, under which a positive ISAC gain is feasible.

ITApr 20, 2021
Turning Channel Noise into an Accelerator for Over-the-Air Principal Component Analysis

Zezhong Zhang, Guangxu Zhu, Rui Wang et al.

Recently years, the attempts on distilling mobile data into useful knowledge has been led to the deployment of machine learning algorithms at the network edge. Principal component analysis (PCA) is a classic technique for extracting the linear structure of a dataset, which is useful for feature extraction and data compression. In this work, we propose the deployment of distributed PCA over a multi-access channel based on the algorithm of stochastic gradient descent to learn the dominant feature space of a distributed dataset at multiple devices. Over-the-air aggregation is adopted to reduce the multi-access latency, giving the name over-the-air PCA. The novelty of this design lies in exploiting channel noise to accelerate the descent in the region around each saddle point encountered by gradient descent, thereby increasing the convergence speed of over-the-air PCA. The idea is materialized by proposing a power-control scheme which detects the type of descent region and controlling the level of channel noise accordingly. The scheme is proved to achieve a faster convergence rate than in the case without power control.

ITDec 30, 2018
Broadband Analog Aggregation for Low-Latency Federated Edge Learning (Extended Version)

Guangxu Zhu, Yong Wang, Kaibin Huang

The popularity of mobile devices results in the availability of enormous data and computational resources at the network edge. To leverage the data and resources, a new machine learning paradigm, called edge learning, has emerged where learning algorithms are deployed at the edge for providing fast and intelligent services to mobile users. While computing speeds are advancing rapidly, the communication latency is becoming the bottleneck of fast edge learning. To address this issue, this work is focused on designing a low latency multi-access scheme for edge learning. We consider a popular framework, federated edge learning (FEEL), where edge-server and on-device learning are synchronized to train a model without violating user-data privacy. It is proposed that model updates simultaneously transmitted by devices over broadband channels should be analog aggregated "over-the-air" by exploiting the superposition property of a multi-access channel. Thereby, "interference" is harnessed to provide fast implementation of the model aggregation. This results in dramatical latency reduction compared with the traditional orthogonal access (i.e., OFDMA). In this work, the performance of FEEL is characterized targeting a single-cell random network. First, due to power alignment between devices as required for aggregation, a fundamental tradeoff is shown to exist between the update-reliability and the expected update-truncation ratio. This motivates the design of an opportunistic scheduling scheme for FEEL that selects devices within a distance threshold. This scheme is shown using real datasets to yield satisfactory learning performance in the presence of high mobility. Second, both the multi-access latency of the proposed analog aggregation and the OFDMA scheme are analyzed. Their ratio, which quantifies the latency reduction of the former, is proved to scale almost linearly with device population.

ITDec 5, 2018
Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission

Dongzhu Liu, Guangxu Zhu, Jun Zhang et al.

By deploying machine-learning algorithms at the network edge, edge learning can leverage the enormous real-time data generated by billions of mobile devices to train AI models, which enable intelligent mobile applications. In this emerging research area, one key direction is to efficiently utilize radio resources for wireless data acquisition to minimize the latency of executing a learning task at an edge server. Along this direction, we consider the specific problem of retransmission decision in each communication round to ensure both reliability and quantity of those training data for accelerating model convergence. To solve the problem, a new retransmission protocol called data-importance aware automatic-repeat-request (importance ARQ) is proposed. Unlike the classic ARQ focusing merely on reliability, importance ARQ selectively retransmits a data sample based on its uncertainty which helps learning and can be measured using the model under training. Underpinning the proposed protocol is a derived elegant communication-learning relation between two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data uncertainty. This relation facilitates the design of a simple threshold based policy for importance ARQ. The policy is first derived based on the classic classifier model of support vector machine (SVM), where the uncertainty of a data sample is measured by its distance to the decision boundary. The policy is then extended to the more complex model of convolutional neural networks (CNN) where data uncertainty is measured by entropy. Extensive experiments have been conducted for both the SVM and CNN using real datasets with balanced and imbalanced distributions. Experimental results demonstrate that importance ARQ effectively copes with channel fading and noise in wireless data acquisition to achieve faster model convergence than the conventional channel-aware ARQ.

ITSep 2, 2018
Towards an Intelligent Edge: Wireless Communication Meets Machine Learning

Guangxu Zhu, Dongzhu Liu, Yuqing Du et al.

The recent revival of artificial intelligence (AI) is revolutionizing almost every branch of science and technology. Given the ubiquitous smart mobile gadgets and Internet of Things (IoT) devices, it is expected that a majority of intelligent applications will be deployed at the edge of wireless networks. This trend has generated strong interests in realizing an "intelligent edge" to support AI-enabled applications at various edge devices. Accordingly, a new research area, called edge learning, emerges, which crosses and revolutionizes two disciplines: wireless communication and machine learning. A major theme in edge learning is to overcome the limited computing power, as well as limited data, at each edge device. This is accomplished by leveraging the mobile edge computing (MEC) platform and exploiting the massive data distributed over a large number of edge devices. In such systems, learning from distributed data and communicating between the edge server and devices are two critical and coupled aspects, and their fusion poses many new research challenges. This article advocates a new set of design principles for wireless communication in edge learning, collectively called learning-driven communication. Illustrative examples are provided to demonstrate the effectiveness of these design principles, and unique research opportunities are identified.

LGAug 7, 2018
Grassmannian Learning: Embedding Geometry Awareness in Shallow and Deep Learning

Jiayao Zhang, Guangxu Zhu, Robert W. Heath et al.

Modern machine learning algorithms have been adopted in a range of signal-processing applications spanning computer vision, natural language processing, and artificial intelligence. Many relevant problems involve subspace-structured features, orthogonality constrained or low-rank constrained objective functions, or subspace distances. These mathematical characteristics are expressed naturally using the Grassmann manifold. Unfortunately, this fact is not yet explored in many traditional learning algorithms. In the last few years, there have been growing interests in studying Grassmann manifold to tackle new learning problems. Such attempts have been reassured by substantial performance improvements in both classic learning and learning using deep neural networks. We term the former as shallow and the latter deep Grassmannian learning. The aim of this paper is to introduce the emerging area of Grassmannian learning by surveying common mathematical problems and primary solution approaches, and overviewing various applications. We hope to inspire practitioners in different fields to adopt the powerful tool of Grassmannian learning in their research.