Kezhi Wang

LG
h-index116
52papers
1,705citations
Novelty46%
AI Score56

52 Papers

AIJul 7, 2023
Large AI Model-Based Semantic Communications

Feibo Jiang, Yubo Peng, Li Dong et al.

Semantic communication (SC) is an emerging intelligent paradigm, offering solutions for various future applications like metaverse, mixed reality, and the Internet of Everything. However, in current SC systems, the construction of the knowledge base (KB) faces several issues, including limited knowledge representation, frequent knowledge updates, and insecure knowledge sharing. Fortunately, the development of the large AI model (LAM) provides new solutions to overcome the above issues. Here, we propose a LAM-based SC framework (LAM-SC) specifically designed for image data, where we first apply the segment anything model (SAM)-based KB (SKB) that can split the original image into different semantic segments by universal semantic knowledge. Then, we present an attention-based semantic integration (ASI) to weigh the semantic segments generated by SKB without human participation and integrate them as the semantic aware image. Additionally, we propose an adaptive semantic compression (ASC) encoding to remove redundant information in semantic features, thereby reducing communication overhead. Finally, through simulations, we demonstrate the effectiveness of the LAM-SC framework and the possibility of applying the LAM-based KB in future SC paradigms.

AIAug 29, 2023
LAMBO: Large AI Model Empowered Edge Intelligence

Li Dong, Feibo Jiang, Yubo Peng et al.

Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques. However, traditional offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability. In this paper, we propose a Large AI Model-Based Offloading (LAMBO) framework with over one billion parameters for solving these problems. We first use input embedding (IE) to achieve normalized feature representation with heterogeneous constraints and task prompts. Then, we introduce a novel asymmetric encoder-decoder (AED) as the decision-making model, which is an improved transformer architecture consisting of a deep encoder and a shallow decoder for global perception and decision. Next, actor-critic learning (ACL) is used to pre-train the AED for different optimization tasks under corresponding prompts, enhancing the AED's generalization in multi-task scenarios. Finally, we propose an active learning from expert feedback (ALEF) method to fine-tune the decoder of the AED for tracking changes in dynamic environments. Our simulation results validate the advantages of the proposed LAMBO framework.

NIDec 26, 2022
Beyond 5G Networks: Integration of Communication, Computing, Caching, and Control

Musbahu Mohammed Adam, Liqiang Zhao, Kezhi Wang et al.

In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.

NIMar 25
6D Movable Antenna for Internet of Vehicles: CSI-Free Dynamic Antenna Configuration

Maoxin Ji, Qiong Wu, Pingyi Fan et al.

Deploying six-dimensional movable antenna (6DMA) systems in Internet-of-Vehicles (IoV) scenarios can greatly enhance spectral efficiency. However, the high mobility of vehicles causes rapid spatio-temporal channel variations, posing a significant challenge to real-time 6DMA optimization. In this work, we pioneer the application of 6DMA in IoV and propose a low-complexity, instantaneous channel state information (CSI)-free dynamic configuration method. By integrating vehicle motion prediction with offline directional response priors, the proposed approach optimizes antenna positions and orientations at each reconfiguration epoch to maximize the average sum rate over a future time window. Simulation results in a typical urban intersection scenario demonstrate that the proposed 6DMA scheme significantly outperforms conventional fixed antenna arrays and simplified 6DMA baseline schemes in terms of total sum rate.

LGJul 21, 2023
Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning

Yao Wen, Guopeng Zhang, Kezhi Wang et al.

To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the edge computing and split learning to propose a model-splitting allowed FL (SFL) framework, with the aim to minimize the training latency without loss of test accuracy. Under the synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a regression method to fit the quantitative-relationship between the cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the cut-layer selection problem for the clients and the computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed SFL framework.

LGDec 10, 2025
Federated Distillation Assisted Vehicle Edge Caching Scheme Based on Lightweight DDPM

Xun Li, Qiong Wu, Pingyi Fan et al.

Vehicle edge caching is a promising technology that can significantly reduce the latency for vehicle users (VUs) to access content by pre-caching user-interested content at edge nodes. It is crucial to accurately predict the content that VUs are interested in without exposing their privacy. Traditional federated learning (FL) can protect user privacy by sharing models rather than raw data. However, the training of FL requires frequent model transmission, which can result in significant communication overhead. Additionally, vehicles may leave the road side unit (RSU) coverage area before training is completed, leading to training failures. To address these issues, in this letter, we propose a federated distillation-assisted vehicle edge caching scheme based on lightweight denoising diffusion probabilistic model (LDPM). The simulation results demonstrate that the proposed vehicle edge caching scheme has good robustness to variations in vehicle speed, significantly reducing communication overhead and improving cache hit percentage.

IVMay 26
TWIST: Closed-Loop token Synchronization for Application-Aware Wireless Digital Twins

Sige Liu, Kezhi Wang

Wireless digital twins require repeated synchronization between a time-evolving physical scene and its digital counterpart under limited and time-varying communication resources. For perception-centric twins, pixel-domain transmission or uniformly protected bitstreams can be mismatched to the semantic state consumed by twin-side applications. This paper proposes TWIST, a closed-loop token synchronization framework for application-aware wireless digital twins. TWIST represents each physical observation as a token and synchronizes this state over a wireless link, rather than optimizing visual reconstruction. Token positions are grouped by task relevance and protected through mode-conditioned unequal error protection under low-, medium-, and high-synchronization modes. At the twin side, decoding confidence converts unreliable hard token decisions into erasures, which are restored by a completion model before updating the semantic twin state. The recovered state supports traffic-state inference and generates compact feedback statistics, including channel quality, receiver uncertainty, semantic drift, and application priority, for subsequent mode adaptation. Experiments on a dynamic road-scene digital-twin scenario show that TWIST improves traffic-state inference and semantic twin-state synchronization compared with fixed-mode and channel-only adaptation strategies, while reducing the average synchronization cost relative to always-high transmission.

CVDec 4, 2025
WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism

Ruijing Liu, Cunhua Pan, Jiaming Zeng et al.

While fulfilling communication tasks, wireless signals can also be used to sense the environment. Among various types of sensing media, WiFi signals offer advantages such as widespread availability, low hardware cost, and strong robustness to environmental conditions like light, temperature, and humidity. By analyzing Wi-Fi signals in the environment, it is possible to capture dynamic changes of the human body and accomplish sensing applications such as gesture recognition. Although many existing gesture sensing solutions perform well in-domain but lack cross-domain capabilities (i.e., recognition performance in untrained environments). To address this, we extract Doppler spectra from the channel state information (CSI) received by all receivers and concatenate each Doppler spectrum along the same time axis to generate fused images with multi-angle information as input features. Furthermore, inspired by the convolutional block attention module (CBAM), we propose a gesture recognition network that integrates a multi-semantic spatial attention mechanism with a self-attention-based channel mechanism. This network constructs attention maps to quantify the spatiotemporal features of gestures in images, enabling the extraction of key domain-independent features. Additionally, ResNet18 is employed as the backbone network to further capture deep-level features. To validate the network performance, we evaluate the proposed network on the public Widar3 dataset, and the results show that it not only maintains high in-domain accuracy of 99.72%, but also achieves high performance in cross-domain recognition of 97.61%, significantly outperforming existing best solutions.

SPApr 23
Robust Cross-Domain WiFi Fall Detection via Physics-Driven Attention-Enhanced Transformers

Yingzhe Wang, Cunhua Pan, Ruijing Liu et al.

Device-free fall detection utilizing WiFi Channel State Information (CSI) has emerged as a promising, privacy-preserving solution for elderly health monitoring in the Internet of Things (IoT) era. However, existing deep learning approaches suffer from severe performance degradation when deployed in unseen environments due to static background overfitting and Non-Line-of-Sight (NLoS) signal attenuation. To address these critical bottlenecks, we propose a robust, domain-generalizable framework featuring a novel Attention-Enhanced CNN-Transformer hybrid architecture. First, we design a physics-driven \textbf{Dynamic Variance Gate (DVG)} to dynamically calculate local temporal variance, acting as a soft-attention mask that eliminates static environmental DC components while amplifying dynamic human motion. Second, we introduce a Physics-Aware Data Augmentation strategy to force the network to learn invariant morphological signatures rather than environment-specific noise. Furthermore, a Convolutional Block Attention Module (CBAM) is integrated to refine spatiotemporal features prior to Transformer-based sequence modeling. Extensive cross-domain evaluations across four distinct indoor environments demonstrate that our method achieves 97.6\% accuracy in NLoS scenarios and 98.8\% in completely unseen environments without target-domain fine-tuning. Finally, we deploy the proposed framework on an edge computing system equipped with commercial WiFi NICs. Real-world live inference field tests confirm the system's robustness against unseen environmental layouts and its capability for continuous, low-latency whole-home safety monitoring.

ITMar 24
Aerial Agentic AI: Synergizing LLM and SLM for Low-Altitude Wireless Networks

Li Dong, Feibo Jiang, Kezhi Wang et al.

Low-Altitude Wireless Networks (LAWNs), composed of Unmanned Aerial Vehicles (UAVs) and mobile terminals, are emerging as a critical extension of 6G. However, applying Large Language Models in LAWNs faces three major challenges: 1) Computational and energy constraints; 2) Communication and bandwidth limitations; 3) Real-time and reliability conflicts. To address these challenges, we propose Aerial Agentic AI, a hierarchical framework integrating UAV-side fast-thinking Small Language Model (SLMs) with BS-side slow-thinking Large Language Model (LLMs). First, we design SLM-based Agents capable of on-board perception, short-term memory enhancement, and real-time decision-making on the UAVs. Second, we implement a LLM-based Agent system that leverages long-term memory, global knowledge, and tool orchestration at the Base Station (BS) to perform deep reasoning, knowledge updates, and strategy optimization. Third, we establish an efficient hierarchical coordination mechanism, enabling UAVs to execute high-frequency tasks locally while synchronizing with the BS only when necessary. Experimental results validate the effectiveness of the proposed Aerial Agentic AI.

NIMar 21
A Unified Cloud-Edge-Terminal Framework for Multimodal Integrated Sensing and Communication

Yubo Peng, Luping Xiang, Kun Yang et al.

The transition to 6G calls for tightly integrated sensing and communication to support mission-critical services such as autonomous driving, embodied AI, and high-precision telemedicine. However, most existing ISAC designs rely on a single sensing modality (often RF), which limits environmental understanding and becomes a bottleneck in complex and dynamic scenes. This motivates a shift from single-modal to multimodal ISAC, where heterogeneous sensors (e.g., radar, LiDAR, and cameras) complement each other to improve robustness and semantic awareness. In this article, we first summarize key challenges for multimodal ISAC, including heterogeneous fusion, communication overhead, and scalable system design. We then highlight three enabling technologies: large AI models, semantic communications, and multi-agent systems, and discuss how their combination can enable task-oriented multimodal perception. Building on these insights, we propose a unified cloud-edge-terminal (CET) framework that hierarchically distributes intelligence and supports three adaptive operation modes: global fusion mode (GFM), cooperative relay mode (CRM), and peer interaction mode (PIM). A case study evaluates the framework across three modes, demonstrating that GFM achieves the highest accuracy, PIM minimizes latency, and CRM strikes an optimal balance between performance and efficiency. Finally, we conclude with open research issues and future directions.

LGMay 20
TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

Sige Liu, Kezhi Wang

Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch between what is transmitted reliably and what downstream models actually consume. This mismatch calls for a communication design that directly accounts for token-level task relevance and downstream model requirements, rather than treating all transmitted bits as equally important. In this paper, we propose TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts each source sample into a sequence of tokens, estimates token-level task relevance, and allocates protection through utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence is used to gate unreliable decisions, turning harmful substitutions into recoverable erasures before a Transformer-based completion model restores the masked tokens for final task inference. Our framework combines transmitter-side semantic-aware protection with receiver-side confidence-aware gating in a modular and interpretable architecture, rather than relying solely on fully black-box end-to-end learning. We further establish a utility-aware Bayes-risk interpretation for the receiver-side gating rule and study its interaction with unequal protection and completion. Experimental results on image classification show that TONIC consistently outperforms separation-based schemes, the pixel-domain DeepJSCC baseline, and token-domain baselines under matched communication budgets over AWGN, Rayleigh, and Rician channels.

LGDec 10, 2025
Semantic-Aware Cooperative Communication and Computation Framework in Vehicular Networks

Jingbo Zhang, Maoxin Ji, Qiong Wu et al.

Semantic Communication (SC) combined with Vehicular edge computing (VEC) provides an efficient edge task processing paradigm for Internet of Vehicles (IoV). Focusing on highway scenarios, this paper proposes a Tripartite Cooperative Semantic Communication (TCSC) framework, which enables Vehicle Users (VUs) to perform semantic task offloading via Vehicle-to-Infrastructure (V2I) and Vehicle-to-Vehicle (V2V) communications. Considering task latency and the number of semantic symbols, the framework constructs a Mixed-Integer Nonlinear Programming (MINLP) problem, which is transformed into two subproblems. First, we innovatively propose a multi-agent proximal policy optimization task offloading optimization method based on parametric distribution noise (MAPPO-PDN) to solve the optimization problem of the number of semantic symbols; second, linear programming (LP) is used to solve offloading ratio. Simulations show that performance of this scheme is superior to that of other algorithms.

NIFeb 4
LLM-Empowered Cooperative Content Caching in Vehicular Fog Caching-Assisted Platoon Networks

Bowen Tan, Qiong Wu, Pingyi Fan et al.

This letter proposes a novel three-tier content caching architecture for Vehicular Fog Caching (VFC)-assisted platoon, where the VFC is formed by the vehicles driving near the platoon. The system strategically coordinates storage across local platoon vehicles, dynamic VFC clusters, and cloud server (CS) to minimize content retrieval latency. To efficiently manage distributed storage, we integrate large language models (LLMs) for real-time and intelligent caching decisions. The proposed approach leverages LLMs' ability to process heterogeneous information, including user profiles, historical data, content characteristics, and dynamic system states. Through a designed prompting framework encoding task objectives and caching constraints, the LLMs formulate caching as a decision-making task, and our hierarchical deterministic caching mapping strategy enables adaptive requests prediction and precise content placement across three tiers without frequent retraining. Simulation results demonstrate the advantages of our proposed caching scheme.

AINov 30, 2025
SemAgent: Semantic-Driven Agentic AI Empowered Trajectory Prediction in Vehicular Networks

Lin Zhu, Kezhi Wang, Luping Xiang et al.

Efficient information exchange and reliable contextual reasoning are essential for vehicle-to-everything (V2X) networks. Conventional communication schemes often incur significant transmission overhead and latency, while existing trajectory prediction models generally lack environmental perception and logical inference capabilities. This paper presents a trajectory prediction framework that integrates semantic communication with Agentic AI to enhance predictive performance in vehicular environments. In vehicle-to-infrastructure (V2I) communication, a feature-extraction agent at the Roadside Unit (RSU) derives compact representations from historical vehicle trajectories, followed by semantic reasoning performed by a semantic-analysis agent. The RSU then transmits both feature representations and semantic insights to the target vehicle via semantic communication, enabling the vehicle to predict future trajectories by combining received semantics with its own historical data. In vehicle-to-vehicle (V2V) communication, each vehicle performs local feature extraction and semantic analysis while receiving predicted trajectories from neighboring vehicles, and jointly utilizes this information for its own trajectory prediction. Extensive experiments across diverse communication conditions demonstrate that the proposed method significantly outperforms baseline schemes, achieving up to a 47.5% improvement in prediction accuracy under low signal-to-noise ratio (SNR) conditions.

AIJan 27
ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks

Haoyun Li, Ming Xiao, Kezhi Wang et al.

Emerging 6G networks rely on complex cross-layer optimization, yet manually translating high-level intents into mathematical formulations remains a bottleneck. While Large Language Models (LLMs) offer promise, monolithic approaches often lack sufficient domain grounding, constraint awareness, and verification capabilities. To address this, we present ComAgent, a multi-LLM agentic AI framework. ComAgent employs a closed-loop Perception-Planning-Action-Reflection cycle, coordinating specialized agents for literature search, coding, and scoring to autonomously generate solver-ready formulations and reproducible simulations. By iteratively decomposing problems and self-correcting errors, the framework effectively bridges the gap between user intent and execution. Evaluations demonstrate that ComAgent achieves expert-comparable performance in complex beamforming optimization and outperforms monolithic LLMs across diverse wireless tasks, highlighting its potential for automating design in emerging wireless networks.

NIMay 12
Toward Communication-Efficient Space Data Centers: Bottlenecks, Architectures, and New Paradigms

Minghao Sun, Zehui Chen, Jinbo Hou et al.

The rapid growth of foundation model training and large-scale AI services has driven ground data centers toward unprecedented power densities, intensifying challenges in energy supply, cooling, and spatial scalability. Space Data Centers (SDCs) have emerged as a promising paradigm for hosting energy-intensive computing infrastructures in orbit, leveraging continuous solar energy and radiative cooling advantages. However, unlike ground facilities primarily constrained by power and site availability, SDCs are fundamentally limited by communication capability. The gap between petabit-scale internal data exchange in ground data centers and the gigabit-scale capacity of ground-space links forms a critical bottleneck. This article systematically analyzes communication constraints in SDC architectures and explores semantic communication as a key enabling paradigm. By transmitting compact, task-relevant semantic representations instead of raw data, uplink pressure can be substantially reduced. The feasibility of communication-efficient orbital AI infrastructures is demonstrated through the evaluation of a multi-layer heterogeneous SDC framework consisting of relay satellites and orbital computing nodes operating under coupled energy and thermal constraints. The article further outlines open research challenges toward scalable deployment.

AISep 30, 2024
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models

Yizhou Huang, Yihua Cheng, Kezhi Wang

Deep learning architectures with powerful reasoning capabilities have driven significant advancements in autonomous driving technology. Large language models (LLMs) applied in this field can describe driving scenes and behaviors with a level of accuracy similar to human perception, particularly in visual tasks. Meanwhile, the rapid development of edge computing, with its advantage of proximity to data sources, has made edge devices increasingly important in autonomous driving. Edge devices process data locally, reducing transmission delays and bandwidth usage, and achieving faster response times. In this work, we propose a driving behavior narration and reasoning framework that applies LLMs to edge devices. The framework consists of multiple roadside units, with LLMs deployed on each unit. These roadside units collect road data and communicate via 5G NSR/NR networks. Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds. Additionally, we propose a prompt strategy to enhance the narration and reasoning performance of the system. This strategy integrates multi-modal information, including environmental, agent, and motion data. Experiments conducted on the OpenDV-Youtube dataset demonstrate that our approach significantly improves performance across both tasks.

AIDec 13, 2023
Large Language Model Enhanced Multi-Agent Systems for 6G Communications

Feibo Jiang, Li Dong, Yubo Peng et al.

The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e.g., network optimization and management by allowing users to input task requirements to LLMs by nature language. However, directly applying native LLMs in 6G encounters various challenges, such as a lack of private communication data and knowledge, limited logical reasoning, evaluation, and refinement abilities. Integrating LLMs with the capabilities of retrieval, planning, memory, evaluation and reflection in agents can greatly enhance the potential of LLMs for 6G communications. To this end, we propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language, comprising three components: (1) Multi-agent Data Retrieval (MDR), which employs the condensate and inference agents to refine and summarize communication knowledge from the knowledge base, expanding the knowledge boundaries of LLMs in 6G communications; (2) Multi-agent Collaborative Planning (MCP), which utilizes multiple planning agents to generate feasible solutions for the communication related task from different perspectives based on the retrieved knowledge; (3) Multi-agent Evaluation and Reflecxion (MER), which utilizes the evaluation agent to assess the solutions, and applies the reflexion agent and refinement agent to provide improvement suggestions for current solutions. Finally, we validate the effectiveness of the proposed multi-agent system by designing a semantic communication system, as a case study of 6G communications.

NIMay 5
Single-Step Six-Dimensional Movable Antenna Reconfiguration for High-Mobility IoV: Modeling, Analysis, and Optimization

Maoxin Ji, Qiong Wu, Pingyi Fan et al.

The Six-Dimensional Movable Antenna (6DMA) system has emerged as a promising technology to enhance wireless capacity by fully exploiting spatial degrees of freedom. However, applying 6DMA to high-mobility Internet of Vehicles (IoV) scenarios faces significant challenges, primarily due to the difficulty of acquiring instantaneous Channel State Information (CSI) and the risk of service interruptions caused by mechanical reconfiguration delays. To address these issues, this paper proposes a low-complexity, CSI-free single-step reconfiguration framework. First, we design a deterministic discrete position generation scheme based on a latitude-longitude grid with inherent topological structures. Leveraging graph theory, we explicitly model and theoretically derive the lower bounds of movement and time costs for antenna reconfiguration. Subsequently, utilizing the directional sparsity of 6DMA channels, we develop an adaptive optimization strategy that fuses offline environmental priors with online historical feedback. Furthermore, a periodic reconfiguration mechanism based on predicted cumulative vehicle distributions is introduced. By strictly restricting antenna adjustments to the first-order spatial neighborhood, the proposed single-step method effectively eliminates service interruptions. Simulation results demonstrate that the proposed scheme significantly outperforms traditional fixed and global-search-based benchmarks in terms of uplink sum rate, while incurring negligible mechanical overhead and latency, thereby validating its feasibility and robustness in highly dynamic vehicular networks.

CVMar 1
FoSS: Modeling Long Range Dependencies and Multimodal Uncertainty in Trajectory Prediction via Fourier State Space Integration

Yizhou Huang, Gengze Jiang, Yihua Cheng et al.

Accurate trajectory prediction is vital for safe autonomous driving, yet existing approaches struggle to balance modeling power and computational efficiency. Attention-based architectures incur quadratic complexity with increasing agents, while recurrent models struggle to capture long-range dependencies and fine-grained local dynamics. Building upon this, we present FoSS, a dual-branch framework that unifies frequency-domain reasoning with linear-time sequence modeling. The frequency-domain branch performs a discrete Fourier transform to decompose trajectories into amplitude components encoding global intent and phase components capturing local variations, followed by a progressive helix reordering module that preserves spectral order; two selective state-space submodules, Coarse2Fine-SSM and SpecEvolve-SSM, refine spectral features with O(N) complexity. In parallel, a time-domain dynamic selective SSM reconstructs self-attention behavior in linear time to retain long-range temporal context. A cross-attention layer fuses temporal and spectral representations, while learnable queries generate multiple candidate trajectories, and a weighted fusion head expresses motion uncertainty. Experiments on Argoverse 1 and Argoverse 2 benchmarks demonstrate that FoSS achieves state-of-the-art accuracy while reducing computation by 22.5% and parameters by over 40%. Comprehensive ablations confirm the necessity of each component.

LGNov 15, 2025
SenseRay-3D: Generalizable and Physics-Informed Framework for End-to-End Indoor Propagation Modeling

Yu Zheng, Kezhi Wang, Wenji Xi et al.

Modeling indoor radio propagation is crucial for wireless network planning and optimization. However, existing approaches often rely on labor-intensive manual modeling of geometry and material properties, resulting in limited scalability and efficiency. To overcome these challenges, this paper presents SenseRay-3D, a generalizable and physics-informed end-to-end framework that predicts three-dimensional (3D) path-loss heatmaps directly from RGB-D scans, thereby eliminating the need for explicit geometry reconstruction or material annotation. The proposed framework builds a sensing-driven voxelized scene representation that jointly encodes occupancy, electromagnetic material characteristics, and transmitter-receiver geometry, which is processed by a SwinUNETR-based neural network to infer environmental path-loss relative to free-space path-loss. A comprehensive synthetic indoor propagation dataset is further developed to validate the framework and to serve as a standardized benchmark for future research. Experimental results show that SenseRay-3D achieves a mean absolute error of 4.27 dB on unseen environments and supports real-time inference at 217 ms per sample, demonstrating its scalability, efficiency, and physical consistency. SenseRay-3D paves a new path for sense-driven, generalizable, and physics-consistent modeling of indoor propagation, marking a major leap beyond our pioneering EM DeepRay framework.

AIMay 28, 2025
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications

Feibo Jiang, Cunhua Pan, Li Dong et al.

With the advent of 6G communications, intelligent communication systems face multiple challenges, including constrained perception and response capabilities, limited scalability, and low adaptability in dynamic environments. This tutorial provides a systematic introduction to the principles, design, and applications of Large Artificial Intelligence Models (LAMs) and Agentic AI technologies in intelligent communication systems, aiming to offer researchers a comprehensive overview of cutting-edge technologies and practical guidance. First, we outline the background of 6G communications, review the technological evolution from LAMs to Agentic AI, and clarify the tutorial's motivation and main contributions. Subsequently, we present a comprehensive review of the key components required for constructing LAMs. We further categorize LAMs and analyze their applicability, covering Large Language Models (LLMs), Large Vision Models (LVMs), Large Multimodal Models (LMMs), Large Reasoning Models (LRMs), and lightweight LAMs. Next, we propose a LAM-centric design paradigm tailored for communications, encompassing dataset construction and both internal and external learning approaches. Building upon this, we develop an LAM-based Agentic AI system for intelligent communications, clarifying its core components such as planners, knowledge bases, tools, and memory modules, as well as its interaction mechanisms. We also introduce a multi-agent framework with data retrieval, collaborative planning, and reflective evaluation for 6G. Subsequently, we provide a detailed overview of the applications of LAMs and Agentic AI in communication scenarios. Finally, we summarize the research challenges and future directions in current studies, aiming to support the development of efficient, secure, and sustainable next-generation intelligent communication systems.

LGApr 20, 2024
Personalized Wireless Federated Learning for Large Language Models

Feibo Jiang, Li Dong, Siwei Tu et al.

Large language models (LLMs) have driven profound transformations in wireless networks. However, within wireless environments, the training of LLMs faces significant challenges related to security and privacy. Federated Learning (FL), with its decentralized architecture, offers enhanced data privacy protection. Nevertheless, when integrated with LLMs, FL still struggles with several critical limitations, including large-scale and heterogeneous data, resource-intensive training, and substantial communication overhead. To address these challenges, this paper first presents a systematic analysis of the distinct training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning. Building upon this foundation, we propose a Personalized Wireless Federated Fine-tuning (PWFF) framework. Initially, we utilize the adapter and Low-Rank Adaptation (LoRA) techniques to decrease energy consumption, while employing global partial aggregation to reduce communication delay. Subsequently, we develop two reward models and design a personalized loss function to fulfill the goal of personalized learning. Furthermore, we implement a local multi-objective alignment to ensure the stability and effectiveness of the FL process. Finally, we conduct a series of simulations to validate the performance of the proposed PWFF method and provide an in-depth discussion of the open issues.

NIMar 6, 2025
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi et al.

This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.

ITMar 9, 2024
Large Generative Model Assisted 3D Semantic Communication

Feibo Jiang, Yubo Peng, Li Dong et al.

Semantic Communication (SC) is a novel paradigm for data transmission in 6G. However, there are several challenges posed when performing SC in 3D scenarios: 1) 3D semantic extraction; 2) Latent semantic redundancy; and 3) Uncertain channel estimation. To address these issues, we propose a Generative AI Model assisted 3D SC (GAM-3DSC) system. Firstly, we introduce a 3D Semantic Extractor (3DSE), which employs generative AI models, including Segment Anything Model (SAM) and Neural Radiance Field (NeRF), to extract key semantics from a 3D scenario based on user requirements. The extracted 3D semantics are represented as multi-perspective images of the goal-oriented 3D object. Then, we present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images, in which we use a semantic encoder with two output heads to perform semantic encoding and mask redundant semantics in the latent semantic space, respectively. Next, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels. Finally, simulation results demonstrate the advantages of the proposed GAM-3DSC system in effectively transmitting the goal-oriented 3D scenario.

AIFeb 5
Reasoning-guided Collaborative Filtering with Language Models for Explainable Recommendation

Fahad Anwaar, Adil Mehmood Khan, Muhammad Khalid et al.

Large Language Models (LLMs) exhibit potential for explainable recommendation systems but overlook collaborative signals, while prevailing methods treat recommendation and explanation as separate tasks, resulting in a memory footprint. We present RGCF-XRec, a hybrid framework that introduces reasoning-guided collaborative filtering (CF) knowledge into a language model to deliver explainable sequential recommendations in a single step. Theoretical grounding and empirical findings reveal that RGCF-XRec offers three key merits over leading CF-aware LLM-based methods: (1) reasoning-guided augmentation of CF knowledge through contextual prompting to discover latent preferences and interpretable reasoning paths; (2) an efficient scoring mechanism based on four dimensions: coherence, completeness, relevance, and consistency to mitigate noisy CF reasoning traces and retain high-quality explanations; (3) a unified representation learning network that encodes collaborative and semantic signals, enabling a structured prompt to condition the LLM for explainable sequential recommendation. RGCF-XRec demonstrates consistent improvements across Amazon datasets, Sports, Toys, and Beauty, comprising 642,503 user-item interactions. It improves HR@10 by 7.38\% in Sports and 4.59\% in Toys, along with ROUGE-L by 8.02\% and 3.49\%, respectively. It reduces the cold warm performance gap, achieving overall gains of 14.5\% in cold-start and 11.9\% in warm start scenarios, and enhances zero-shot HR@5 by 18.54\% in Beauty and 23.16\% in Toys, highlighting effective generalization and robustness. Moreover, RGCF-XRec achieves training efficiency with a lightweight LLaMA 3.2-3B backbone, ensuring scalability for real-world applications.

LGMar 11, 2025
SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

Yubo Peng, Luping Xiang, Kun Yang et al.

Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SIMAC) framework. This framework leverages a joint source-channel coding architecture to achieve simultaneous sensing decoding and transmission of sensing results. Specifically, SIMAC first introduces a multimodal semantic fusion (MSF) network, which employs two extractors to extract semantic information from radar signals and images, respectively. MSF then applies cross-attention mechanisms to fuse these unimodal features and generate multimodal semantic representations. Secondly, we present a large language model (LLM)-based semantic encoder (LSE), where relevant communication parameters and multimodal semantics are mapped into a unified latent space and input to the LLM, enabling channel-adaptive semantic encoding. Thirdly, a task-oriented sensing semantic decoder (SSD) is proposed, in which different decoded heads are designed according to the specific needs of tasks. Simultaneously, a multi-task learning strategy is introduced to train the SIMAC framework, achieving diverse sensing services. Finally, experimental simulations demonstrate that the proposed framework achieves diverse sensing services and higher accuracy.

LGNov 7, 2024
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning

Wenjun Zhang, Qiong Wu, Pingyi Fan et al.

Semantic communication transmits the extracted features of information rather than raw data, significantly reducing redundancy, which is crucial for addressing spectrum and energy challenges in 6G networks. In this paper, we introduce semantic communication into a cellular vehicle-to-everything (C-V2X)- based autonomous vehicle platoon system for the first time, aiming to achieve efficient management of communication resources in a dynamic environment. Firstly, we construct a mathematical model for semantic communication in platoon systems, in which the DeepSC model and MU-DeepSC model are used to semantically encode and decode unimodal and multi-modal data, respectively. Then, we propose the quality of experience (QoE) metric based on semantic similarity and semantic rate. Meanwhile, we consider the success rate of semantic information transmission (SRS) metric to ensure the fairness of channel resource allocation. Next, the optimization problem is posed with the aim of maximizing the QoE in vehicle-to-vehicle (V2V) links while improving SRS. To solve this mixed integer nonlinear programming problem (MINLP) and adapt to time-varying channel conditions, the paper proposes a distributed semantic-aware multi-modal resource allocation (SAMRA) algorithm based on multi-agent reinforcement learning (MARL), referred to as SAMRAMARL. The algorithm can dynamically allocate channels and power and determine semantic symbol length based on the contextual importance of the transmitted information, ensuring efficient resource utilization. Finally, extensive simulations have demonstrated that SAMRAMARL outperforms existing methods, achieving significant gains in QoE, SRS, and communication delay in C-V2X platooning scenarios.

CVMar 13, 2025
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Yizhou Huang, Yihua Cheng, Kezhi Wang

Motion prediction is crucial for autonomous driving, as it enables accurate forecasting of future vehicle trajectories based on historical inputs. This paper introduces Trajectory Mamba, a novel efficient trajectory prediction framework based on the selective state-space model (SSM). Conventional attention-based models face the challenge of computational costs that grow quadratically with the number of targets, hindering their application in highly dynamic environments. In response, we leverage the SSM to redesign the self-attention mechanism in the encoder-decoder architecture, thereby achieving linear time complexity. To address the potential reduction in prediction accuracy resulting from modifications to the attention mechanism, we propose a joint polyline encoding strategy to better capture the associations between static and dynamic contexts, ultimately enhancing prediction accuracy. Additionally, to balance prediction accuracy and inference speed, we adopted the decoder that differs entirely from the encoder. Through cross-state space attention, all target agents share the scene context, allowing the SSM to interact with the shared scene representation during decoding, thus inferring different trajectories over the next prediction steps. Our model achieves state-of-the-art results in terms of inference speed and parameter efficiency on both the Argoverse 1 and Argoverse 2 datasets. It demonstrates a four-fold reduction in FLOPs compared to existing methods and reduces parameter count by over 40% while surpassing the performance of the vast majority of previous methods. These findings validate the effectiveness of Trajectory Mamba in trajectory prediction tasks.

LGDec 28, 2024
Explainable Semantic Federated Learning Enabled Industrial Edge Network for Fire Surveillance

Li Dong, Yubo Peng, Feibo Jiang et al.

In fire surveillance, Industrial Internet of Things (IIoT) devices require transmitting large monitoring data frequently, which leads to huge consumption of spectrum resources. Hence, we propose an Industrial Edge Semantic Network (IESN) to allow IIoT devices to send warnings through Semantic communication (SC). Thus, we should consider (1) Data privacy and security. (2) SC model adaptation for heterogeneous devices. (3) Explainability of semantics. Therefore, first, we present an eXplainable Semantic Federated Learning (XSFL) to train the SC model, thus ensuring data privacy and security. Then, we present an Adaptive Client Training (ACT) strategy to provide a specific SC model for each device according to its Fisher information matrix, thus overcoming the heterogeneity. Next, an Explainable SC (ESC) mechanism is designed, which introduces a leakyReLU-based activation mapping to explain the relationship between the extracted semantics and monitoring data. Finally, simulation results demonstrate the effectiveness of XSFL.

LGJan 14, 2025
Enhanced SPS Velocity-adaptive Scheme: Access Fairness in 5G NR V2I Networks

Xiao Xu, Qiong Wu, Pingyi Fan et al.

Vehicle-to-Infrastructure (V2I) technology enables information exchange between vehicles and road infrastructure. Specifically, when a vehicle approaches a roadside unit (RSU), it can exchange information with the RSU to obtain accurate data that assists in driving. With the release of the 3rd Generation Partnership Project (3GPP) Release 16, which includes the 5G New Radio (NR) Vehicle-to-Everything (V2X) standards, vehicles typically adopt mode-2 communication using sensing-based semi-persistent scheduling (SPS) for resource allocation. In this approach, vehicles identify candidate resources within a selection window and exclude ineligible resources based on information from a sensing window. However, vehicles often drive at different speeds, resulting in varying amounts of data transmission with RSUs as they pass by, which leads to unfair access. Therefore, it is essential to design an access scheme that accounts for different vehicle speeds to achieve fair access across the network. This paper formulates an optimization problem for vehicular networks and proposes a multi-objective optimization scheme to address it by adjusting the selection window in the SPS mechanism of 5G NR V2I mode-2. Simulation results demonstrate the effectiveness of the proposed scheme

CVMar 10, 2025
Semantic Communications with Computer Vision Sensing for Edge Video Transmission

Yubo Peng, Luping Xiang, Kun Yang et al.

Despite the widespread adoption of vision sensors in edge applications, such as surveillance, the transmission of video data consumes substantial spectrum resources. Semantic communication (SC) offers a solution by extracting and compressing information at the semantic level, preserving the accuracy and relevance of transmitted data while significantly reducing the volume of transmitted information. However, traditional SC methods face inefficiencies due to the repeated transmission of static frames in edge videos, exacerbated by the absence of sensing capabilities, which results in spectrum inefficiency. To address this challenge, we propose a SC with computer vision sensing (SCCVS) framework for edge video transmission. The framework first introduces a compression ratio (CR) adaptive SC (CRSC) model, capable of adjusting CR based on whether the frames are static or dynamic, effectively conserving spectrum resources. Additionally, we implement an object detection and semantic segmentation models-enabled sensing (OSMS) scheme, which intelligently senses the changes in the scene and assesses the significance of each frame through in-context analysis. Hence, The OSMS scheme provides CR prompts to the CRSC model based on real-time sensing results. Moreover, both CRSC and OSMS are designed as lightweight models, ensuring compatibility with resource-constrained sensors commonly used in practical edge applications. Experimental simulations validate the effectiveness of the proposed SCCVS framework, demonstrating its ability to enhance transmission efficiency without sacrificing critical semantic information.

NIJan 22, 2025
PPO-Based Vehicle Control for Ramp Merging Scheme Assisted by Enhanced C-V2X

Qiong Wu, Maoxin Ji, Pingyi Fan et al.

On-ramp merging presents a critical challenge in autonomous driving, as vehicles from merging lanes need to dynamically adjust their positions and speeds while monitoring traffic on the main road to prevent collisions. To address this challenge, we propose a novel merging control scheme based on reinforcement learning, which integrates lateral control mechanisms. This approach ensures the smooth integration of vehicles from the merging lane onto the main road, optimizing both fuel efficiency and passenger comfort. Furthermore, we recognize the impact of vehicle-to-vehicle (V2V) communication on control strategies and introduce an enhanced protocol leveraging Cellular Vehicle-to-Everything (C-V2X) Mode 4. This protocol aims to reduce the Age of Information (AoI) and improve communication reliability. In our simulations, we employ two AoI-based metrics to rigorously assess the protocol's effectiveness in autonomous driving scenarios. By combining the NS3 network simulator with Python, we simulate V2V communication and vehicle control simultaneously. The results demonstrate that the enhanced C-V2X Mode 4 outperforms the standard version, while the proposed control scheme ensures safe and reliable vehicle operation during on-ramp merging.

SYOct 30, 2024
V2X-Assisted Distributed Computing and Control Framework for Connected and Automated Vehicles under Ramp Merging Scenario

Qiong Wu, Jiahou Chu, Pingyi Fan et al.

This paper investigates distributed computing and cooperative control of connected and automated vehicles (CAVs) in ramp merging scenario under transportation cyber-physical system. Firstly, a centralized cooperative trajectory planning problem is formulated subject to the safely constraints and traffic performance in ramp merging scenario, where the trajectories of all vehicles are jointly optimized. To get rid of the reliance on a central controller and reduce computation time, a distributed solution to this problem implemented among CAVs through Vehicles-to-Everything (V2X) communication is proposed. Unlike existing method, our method can distribute the computational task among CAVs and carry out parallel solving through V2X communication. Then, a multi-vehicles model predictive control (MPC) problem aimed at maximizing system stability and minimizing control input is formulated based on the solution of the first problem subject to strict safety constants and input limits. Due to these complex constraints, this problem becomes high-dimensional, centralized, and non-convex. To solve it in a short time, a decomposition and convex reformulation method, namely distributed cooperative iterative model predictive control (DCIMPC), is proposed. This method leverages the communication capability of CAVs to decompose the problem, making full use of the computational resources on vehicles to achieve fast solutions and distributed control. The two above problems with their corresponding solving methods form the systemic framework of the V2X assisted distributed computing and control. Simulations have been conducted to evaluate the framework's convergence, safety, and solving speed. Additionally, extra experiments are conducted to validate the performance of DCIMPC. The results show that our method can greatly improve computation speed without sacrificing system performance.

AIMar 5
BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry

Zuo Fei, Kezhi Wang, Xiaomin Chen et al.

Computational psychiatry faces a fundamental trade-off: traditional reinforcement learning (RL) models offer interpretability but lack behavioral realism, while large language model (LLM) agents generate realistic behaviors but lack structural interpretability. We introduce BioLLMAgent, a novel hybrid framework that combines validated cognitive models with the generative capabilities of LLMs. The framework comprises three core components: (i) an Internal RL Engine for experience-driven value learning; (ii) an External LLM Shell for high-level cognitive strategies and therapeutic interventions; and (iii) a Decision Fusion Mechanism for integrating components via weighted utility. Comprehensive experiments on the Iowa Gambling Task (IGT) across six clinical and healthy datasets demonstrate that BioLLMAgent accurately reproduces human behavioral patterns while maintaining excellent parameter identifiability (correlations $>0.67$). Furthermore, the framework successfully simulates cognitive behavioral therapy (CBT) principles and reveals, through multi-agent dynamics, that community-wide educational interventions may outperform individual treatments. Validated across reward-punishment learning and temporal discounting tasks, BioLLMAgent provides a structurally interpretable "computational sandbox" for testing mechanistic hypotheses and intervention strategies in psychiatric research.

LGMar 5
U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning

Yiang Wu, Qiong Wu, Pingyi Fan et al.

This demonstration presents U-Parking, a distributed Ultra-Wideband (UWB)-assisted autonomous parking system. By integrating Large Language Models (LLMs)-assisted planning with robust fusion localization and trajectory tracking, it enables reliable automated parking in challenging indoor environments, as validated through real-vehicle demonstrations.

AIFeb 16
Secure and Energy-Efficient Wireless Agentic AI Networks

Yuanyan Song, Kezhi Wang, Xinmian Xu

In this paper, we introduce a secure wireless agentic AI network comprising one supervisor AI agent and multiple other AI agents to provision quality of service (QoS) for users' reasoning tasks while ensuring confidentiality of private knowledge and reasoning outcomes. Specifically, the supervisor AI agent can dynamically assign other AI agents to participate in cooperative reasoning, while the unselected AI agents act as friendly jammers to degrade the eavesdropper's interception performance. To extend the service duration of AI agents, an energy minimization problem is formulated that jointly optimizes AI agent selection, base station (BS) beamforming, and AI agent transmission power, subject to latency and reasoning accuracy constraints. To address the formulated problem, we propose two resource allocation schemes, ASC and LAW, which first decompose it into three sub-problems. Specifically, ASC optimizes each sub-problem iteratively using the proposed alternating direction method of multipliers (ADMM)-based algorithm, semi-definite relaxation (SDR), and successive convex approximation (SCA), while LAW tackles each sub-problem using the proposed large language model (LLM) optimizer within an agentic workflow. The experimental results show that the proposed solutions can reduce network energy consumption by up to 59.1% compared to other benchmark schemes. Furthermore, the proposed schemes are validated using a practical agentic AI system based on Qwen, demonstrating satisfactory reasoning accuracy across various public benchmarks.

CVMar 31, 2025
Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction

Yizhou Huang, Yihua Cheng, Kezhi Wang

Understanding human motion is crucial for accurate pedestrian trajectory prediction. Conventional methods typically rely on supervised learning, where ground-truth labels are directly optimized against predicted trajectories. This amplifies the limitations caused by long-tailed data distributions, making it difficult for the model to capture abnormal behaviors. In this work, we propose a self-supervised pedestrian trajectory prediction framework that explicitly models position, velocity, and acceleration. We leverage velocity and acceleration information to enhance position prediction through feature injection and a self-supervised motion consistency mechanism. Our model hierarchically injects velocity features into the position stream. Acceleration features are injected into the velocity stream. This enables the model to predict position, velocity, and acceleration jointly. From the predicted position, we compute corresponding pseudo velocity and acceleration, allowing the model to learn from data-generated pseudo labels and thus achieve self-supervised learning. We further design a motion consistency evaluation strategy grounded in physical principles; it selects the most reasonable predicted motion trend by comparing it with historical dynamics and uses this trend to guide and constrain trajectory generation. We conduct experiments on the ETH-UCY and Stanford Drone datasets, demonstrating that our method achieves state-of-the-art performance on both datasets.

LGOct 11, 2024
GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning

Yubo Peng, Feibo Jiang, Li Dong et al.

Federated learning (FL) is a commonly distributed algorithm for mobile users (MUs) training artificial intelligence (AI) models, however, several challenges arise when applying FL to real-world scenarios, such as label scarcity, non-IID data, and unexplainability. As a result, we propose an explainable personalized FL framework, called XPFL. First, we introduce a generative AI (GAI) assisted personalized federated semi-supervised learning, called GFed. Particularly, in local training, we utilize a GAI model to learn from large unlabeled data and apply knowledge distillation-based semi-supervised learning to train the local FL model using the knowledge acquired from the GAI model. In global aggregation, we obtain the new local FL model by fusing the local and global FL models in specific proportions, allowing each local model to incorporate knowledge from others while preserving its personalized characteristics. Second, we propose an explainable AI mechanism for FL, named XFed. Specifically, in local training, we apply a decision tree to match the input and output of the local FL model. In global aggregation, we utilize t-distributed stochastic neighbor embedding (t-SNE) to visualize the local models before and after aggregation. Finally, simulation results validate the effectiveness of the proposed XPFL framework.

CVMay 6, 2024
Visual Language Model based Cross-modal Semantic Communication Systems

Feibo Jiang, Chuanguo Tang, Li Dong et al.

Semantic Communication (SC) has emerged as a novel communication paradigm in recent years, successfully transcending the Shannon physical capacity limits through innovative semantic transmission concepts. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low semantic density, catastrophic forgetting, and uncertain Signal-to-Noise Ratio (SNR). To address these challenges, we propose a novel Vision-Language Model-based Cross-modal Semantic Communication (VLM-CSC) system. The VLM-CSC comprises three novel components: (1) Cross-modal Knowledge Base (CKB) is used to extract high-density textual semantics from the semantically sparse image at the transmitter and reconstruct the original image based on textual semantics at the receiver. The transmission of high-density semantics contributes to alleviating bandwidth pressure. (2) Memory-assisted Encoder and Decoder (MED) employ a hybrid long/short-term memory mechanism, enabling the semantic encoder and decoder to overcome catastrophic forgetting in dynamic environments when there is a drift in the distribution of semantic features. (3) Noise Attention Module (NAM) employs attention mechanisms to adaptively adjust the semantic coding and the channel coding based on SNR, ensuring the robustness of the CSC system. The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.

AISep 3, 2023
Large AI Model Empowered Multimodal Semantic Communications

Feibo Jiang, Li Dong, Yubo Peng et al.

Multimodal signals, including text, audio, image, and video, can be integrated into Semantic Communication (SC) systems to provide an immersive experience with low latency and high quality at the semantic level. However, the multimodal SC has several challenges, including data heterogeneity, semantic ambiguity, and signal distortion during transmission. Recent advancements in large AI models, particularly in the Multimodal Language Model (MLM) and Large Language Model (LLM), offer potential solutions for addressing these issues. To this end, we propose a Large AI Model-based Multimodal SC (LAM-MSC) framework, where we first present the MLM-based Multimodal Alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency. Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery through the LLM. This effectively addresses the semantic ambiguity. Finally, we apply the Conditional Generative adversarial network-based channel Estimation (CGE) for estimating the wireless channel state information. This approach effectively mitigates the impact of fading channels in SC. Finally, we conduct simulations that demonstrate the superior performance of the LAM-MSC framework.

LGMay 5, 2023
Over-the-Air Federated Averaging with Limited Power and Privacy Budgets

Na Yan, Kezhi Wang, Cunhua Pan et al.

To jointly overcome the communication bottleneck and privacy leakage of wireless federated learning (FL), this paper studies a differentially private over-the-air federated averaging (DP-OTA-FedAvg) system with a limited sum power budget. With DP-OTA-FedAvg, the gradients are aligned by an alignment coefficient and aggregated over the air, and channel noise is employed to protect privacy. We aim to improve the learning performance by jointly designing the device scheduling, alignment coefficient, and the number of aggregation rounds of federated averaging (FedAvg) subject to sum power and privacy constraints. We first present the privacy analysis based on differential privacy (DP) to quantify the impact of the alignment coefficient on privacy preservation in each communication round. Furthermore, to study how the device scheduling, alignment coefficient, and the number of the global aggregation affect the learning process, we conduct the convergence analysis of DP-OTA-FedAvg in the cases of convex and non-convex loss functions. Based on these analytical results, we formulate an optimization problem to minimize the optimality gap of the DP-OTA-FedAvg subject to limited sum power and privacy budgets. The problem is solved by decoupling it into two sub-problems. Given the number of communication rounds, we conclude the relationship between the number of scheduled devices and the alignment coefficient, which offers a set of potential optimal solution pairs of device scheduling and the alignment coefficient. Thanks to the reduced search space, the optimal solution can be efficiently obtained. The effectiveness of the proposed policy is validated through simulations.

LGSep 23, 2021
Deep Reinforcement Learning-Based Long-Range Autonomous Valet Parking for Smart Cities

Muhammad Khalid, Liang Wang, Kezhi Wang et al.

In this paper, to reduce the congestion rate at the city center and increase the quality of experience (QoE) of each user, the framework of long-range autonomous valet parking (LAVP) is presented, where an Autonomous Vehicle (AV) is deployed in the city, which can pick up, drop off users at their required spots, and then drive to the car park out of city center autonomously. In this framework, we aim to minimize the overall distance of the AV, while guarantee all users are served, i.e., picking up, and dropping off users at their required spots through optimizing the path planning of the AV and number of serving time slots. To this end, we first propose a learning based algorithm, which is named as Double-Layer Ant Colony Optimization (DL-ACO) algorithm to solve the above problem in an iterative way. Then, to make the real-time decision, while consider the dynamic environment (i.e., the AV may pick up and drop off users from different locations), we further present a deep reinforcement learning (DRL) based algorithm, which is known as deep Q network (DQN). The experimental results show that the DL-ACO and DQN-based algorithms both achieve the considerable performance.

LGFeb 26, 2021
Private and Utility Enhanced Recommendations with Local Differential Privacy and Gaussian Mixture Model

Jeyamohan Neera, Xiaomin Chen, Nauman Aslam et al.

Recommendation systems rely heavily on users behavioural and preferential data (e.g. ratings, likes) to produce accurate recommendations. However, users experience privacy concerns due to unethical data aggregation and analytical practices carried out by the Service Providers (SP). Local differential privacy (LDP) based perturbation mechanisms add noise to users data at user side before sending it to the SP. The SP then uses the perturbed data to perform recommendations. Although LDP protects the privacy of users from SP, it causes a substantial decline in predictive accuracy. To address this issue, we propose an LDP-based Matrix Factorization (MF) with a Gaussian Mixture Model (MoG). The LDP perturbation mechanism, Bounded Laplace (BLP), regulates the effect of noise by confining the perturbed ratings to a predetermined domain. We derive a sufficient condition of the scale parameter for BLP to satisfy $ε$ LDP. At the SP, The MoG model estimates the noise added to perturbed ratings and the MF algorithm predicts missing ratings. Our proposed LDP based recommendation system improves the recommendation accuracy without violating LDP principles. The empirical evaluations carried out on three real world datasets, i.e., Movielens, Libimseti and Jester, demonstrate that our method offers a substantial increase in predictive accuracy under strong privacy guarantee.

ITOct 18, 2020
Sliding Differential Evolution Scheduling for Federated Learning in Bandwidth-Limited Networks

Yifan Luo, Jindan Xu, Wei Xu et al.

Federated learning (FL) in a bandwidth-limited network with energy-limited user equipments (UEs) is under-explored. In this paper, to jointly save energy consumed by the battery-limited UEs and accelerate the convergence of the global model in FL for the bandwidth-limited network, we propose the sliding differential evolution-based scheduling (SDES) policy. To this end, we first formulate an optimization that aims to minimize a weighted sum of energy consumption and model training convergence. Then, we apply the SDES with parallel differential evolution (DE) operations in several small-scale windows, to address the above proposed problem effectively. Compared with existing scheduling policies, the proposed SDES performs well in reducing energy consumption and the model convergence with lower computational complexity.

SPSep 23, 2020
Multi-Agent Deep Reinforcement Learning Based Trajectory Planning for Multi-UAV Assisted Mobile Edge Computing

Liang Wang, Kezhi Wang, Cunhua Pan et al.

An unmanned aerial vehicle (UAV)-aided mobile edge computing (MEC) framework is proposed, where several UAVs having different trajectories fly over the target area and support the user equipments (UEs) on the ground. We aim to jointly optimize the geographical fairness among all the UEs, the fairness of each UAV' UE-load and the overall energy consumption of UEs. The above optimization problem includes both integer and continues variables and it is challenging to solve. To address the above problem, a multi-agent deep reinforcement learning based trajectory control algorithm is proposed for managing the trajectory of each UAV independently, where the popular Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method is applied. Given the UAVs' trajectories, a low-complexity approach is introduced for optimizing the offloading decisions of UEs. We show that our proposed solution has considerable performance over other traditional algorithms, both in terms of the fairness for serving UEs, fairness of UE-load at each UAV and energy consumption for all the UEs.

DCJul 27, 2020
A Review on Computational Intelligence Techniques in Cloud and Edge Computing

Muhammad Asim, Yong Wang, Kezhi Wang et al.

Cloud computing (CC) is a centralized computing paradigm that accumulates resources centrally and provides these resources to users through Internet. Although CC holds a large number of resources, it may not be acceptable by real-time mobile applications, as it is usually far away from users geographically. On the other hand, edge computing (EC), which distributes resources to the network edge, enjoys increasing popularity in the applications with low-latency and high-reliability requirements. EC provides resources in a decentralized manner, which can respond to users' requirements faster than the normal CC, but with limited computing capacities. As both CC and EC are resource-sensitive, several big issues arise, such as how to conduct job scheduling, resource allocation, and task offloading, which significantly influence the performance of the whole system. To tackle these issues, many optimization problems have been formulated. These optimization problems usually have complex properties, such as non-convexity and NP-hardness, which may not be addressed by the traditional convex optimization-based solutions. Computational intelligence (CI), consisting of a set of nature-inspired computational approaches, recently exhibits great potential in addressing these optimization problems in CC and EC. This paper provides an overview of research problems in CC and EC and recent progresses in addressing them with the help of CI techniques. Informative discussions and future research trends are also presented, with the aim of offering insights to the readers and motivating new research directions.

DCMay 21, 2020
Distributed Resource Scheduling for Large-Scale MEC Systems: A Multi-Agent Ensemble Deep Reinforcement Learning with Imitation Acceleration

Feibo Jiang, Li Dong, Kezhi Wang et al.

We consider the optimization of distributed resource scheduling to minimize the sum of task latency and energy consumption for all the Internet of things devices (IoTDs) in a large-scale mobile edge computing (MEC) system. To address this problem, we propose a distributed intelligent resource scheduling (DIRS) framework, which includes centralized training relying on the global information and distributed decision making by each agent deployed in each MEC server. More specifically, we first introduce a novel multi-agent ensemble-assisted distributed deep reinforcement learning (DRL) architecture, which can simplify the overall neural network structure of each agent by partitioning the state space and also improve the performance of a single agent by combining decisions of all the agents. Secondly, we apply action refinement to enhance the exploration ability of the proposed DIRS framework, where the near-optimal state-action pairs are obtained by a novel Lévy flight search. Finally, an imitation acceleration scheme is presented to pre-train all the agents, which can significantly accelerate the learning process of the proposed framework through learning the professional experience from a small amount of demonstration data. Extensive simulations are conducted to demonstrate that the proposed DIRS framework is efficient and outperforms the existing benchmark schemes.

LGJan 24, 2020
Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks

Feibo Jiang, Kezhi Wang, Li Dong et al.

An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet of things (IoT) users, by optimizing offloading decision, transmission power and resource allocation in the large-scale mobile edge computing (MEC) system. Towards this end, a deep reinforcement learning (DRL) based solution is proposed, which includes the following components. Firstly, a related and regularized stacked auto encoder (2r-SAE) with unsupervised learning is applied to perform data compression and representation for high dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Secondly, we present an adaptive simulated annealing based approach (ASA) as the action search method of DRL, in which an adaptive h-mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Thirdly, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. Numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks.