AINov 25, 2022
Less Data, More Knowledge: Building Next Generation Semantic Communication NetworksChristina Chaccour, Walid Saad, Merouane Debbah et al.
Semantic communication is viewed as a revolutionary paradigm that can potentially transform how we design and operate wireless communication systems. However, despite a recent surge of research activities in this area, the research landscape remains limited. In this tutorial, we present the first rigorous vision of a scalable end-to-end semantic communication network that is founded on novel concepts from artificial intelligence (AI), causal reasoning, and communication theory. We first discuss how the design of semantic communication networks requires a move from data-driven networks towards knowledge-driven ones. Subsequently, we highlight the necessity of creating semantic representations of data that satisfy the key properties of minimalism, generalizability, and efficiency so as to do more with less. We then explain how those representations can form the basis a so-called semantic language. By using semantic representation and languages, we show that the traditional transmitter and receiver now become a teacher and apprentice. Then, we define the concept of reasoning by investigating the fundamentals of causal representation learning and their role in designing semantic communication networks. We demonstrate that reasoning faculties are majorly characterized by the ability to capture causal and associational relationships in datastreams. For such reasoning-driven networks, we propose novel and essential semantic communication metrics that include new "reasoning capacity" measures that could go beyond Shannon's bound to capture the convergence of computing and communication. Finally, we explain how semantic communications can be scaled to large-scale networks (6G and beyond). In a nutshell, we expect this tutorial to provide a comprehensive reference on how to properly build, analyze, and deploy future semantic communication networks.
ITOct 7, 2010
Coalition Formation Games for Distributed Cooperation Among Roadside Units in Vehicular NetworksWalid Saad, Zhu Han, Are Hjørungnes et al.
Vehicle-to-roadside (V2R) communications enable vehicular networks to support a wide range of applications for enhancing the efficiency of road transportation. While existing work focused on non-cooperative techniques for V2R communications between vehicles and roadside units (RSUs), this paper investigates novel cooperative strategies among the RSUs in a vehicular network. We propose a scheme whereby, through cooperation, the RSUs in a vehicular network can coordinate the classes of data being transmitted through V2R communications links to the vehicles. This scheme improves the diversity of the information circulating in the network while exploiting the underlying content-sharing vehicle-to-vehicle communication network. We model the problem as a coalition formation game with transferable utility and we propose an algorithm for forming coalitions among the RSUs. For coalition formation, each RSU can take an individual decision to join or leave a coalition, depending on its utility which accounts for the generated revenues and the costs for coalition coordination. We show that the RSUs can self-organize into a Nash-stable partition and adapt this partition to environmental changes. Simulation results show that, depending on different scenarios, coalition formation presents a performance improvement, in terms of the average payoff per RSU, ranging between 20.5% and 33.2%, relative to the non-cooperative case.
CVMay 13, 2022
Tensor Decompositions for Hyperspectral Data Processing in Remote Sensing: A Comprehensive ReviewMinghua Wang, Danfeng Hong, Zhu Han et al.
Owing to the rapid development of sensor technology, hyperspectral (HS) remote sensing (RS) imaging has provided a significant amount of spatial and spectral information for the observation and analysis of the Earth's surface at a distance of data acquisition devices, such as aircraft, spacecraft, and satellite. The recent advancement and even revolution of the HS RS technique offer opportunities to realize the full potential of various applications, while confronting new challenges for efficiently processing and analyzing the enormous HS acquisition data. Due to the maintenance of the 3-D HS inherent structure, tensor decomposition has aroused widespread concern and research in HS data processing tasks over the past decades. In this article, we aim at presenting a comprehensive overview of tensor decomposition, specifically contextualizing the five broad topics in HS data processing, and they are HS restoration, compressed sensing, anomaly detection, super-resolution, and spectral unmixing. For each topic, we elaborate on the remarkable achievements of tensor decomposition models for HS RS with a pivotal description of the existing methodologies and a representative exhibition on the experimental results. As a result, the remaining challenges of the follow-up research directions are outlined and discussed from the perspective of the real HS RS practices and tensor decomposition merged with advanced priors and even with deep neural networks. This article summarizes different tensor decomposition-based HS data processing methods and categorizes them into different classes from simple adoptions to complex combinations with other priors for the algorithm beginners. We also expect this survey can provide new investigations and development trends for the experienced researchers who understand tensor decomposition and HS RS to some extent.
AIFeb 16, 2023
Generative AI-empowered Simulation for Autonomous Driving in Vehicular Mixed Reality MetaversesMinrui Xu, Dusit Niyato, Junlong Chen et al.
In the vehicular mixed reality (MR) Metaverse, the distance between physical and virtual entities can be overcome by fusing the physical and virtual environments with multi-dimensional communications in autonomous driving systems. Assisted by digital twin (DT) technologies, connected autonomous vehicles (AVs), roadside units (RSU), and virtual simulators can maintain the vehicular MR Metaverse via digital simulations for sharing data and making driving decisions collaboratively. However, large-scale traffic and driving simulation via realistic data collection and fusion from the physical world for online prediction and offline training in autonomous driving systems are difficult and costly. In this paper, we propose an autonomous driving architecture, where generative AI is leveraged to synthesize unlimited conditioned traffic and driving data in simulations for improving driving safety and traffic efficiency. First, we propose a multi-task DT offloading model for the reliable execution of heterogeneous DT tasks with different requirements at RSUs. Then, based on the preferences of AV's DTs and collected realistic data, virtual simulators can synthesize unlimited conditioned driving and traffic datasets to further improve robustness. Finally, we propose a multi-task enhanced auction-based mechanism to provide fine-grained incentives for RSUs in providing resources for autonomous driving. The property analysis and experimental results demonstrate that the proposed mechanism and architecture are strategy-proof and effective, respectively.
AIJan 18, 2023
Generative AI-empowered Effective Physical-Virtual Synchronization in the Vehicular MetaverseMinrui Xu, Dusit Niyato, Hongliang Zhang et al.
Metaverse seamlessly blends the physical world and virtual space via ubiquitous communication and computing infrastructure. In transportation systems, the vehicular Metaverse can provide a fully-immersive and hyperreal traveling experience (e.g., via augmented reality head-up displays, AR-HUDs) to drivers and users in autonomous vehicles (AVs) via roadside units (RSUs). However, provisioning real-time and immersive services necessitates effective physical-virtual synchronization between physical and virtual entities, i.e., AVs and Metaverse AR recommenders (MARs). In this paper, we propose a generative AI-empowered physical-virtual synchronization framework for the vehicular Metaverse. In physical-to-virtual synchronization, digital twin (DT) tasks generated by AVs are offloaded for execution in RSU with future route generation. In virtual-to-physical synchronization, MARs customize diverse and personal AR recommendations via generative AI models based on user preferences. Furthermore, we propose a multi-task enhanced auction-based mechanism to match and price AVs and MARs for RSUs to provision real-time and effective services. Finally, property analysis and experimental results demonstrate that the proposed mechanism is strategy-proof and adverse-selection free while increasing social surplus by 50%.
LGSep 20, 2023
Federated Learning in Intelligent Transportation Systems: Recent Applications and Open ProblemsShiying Zhang, Jun Li, Long Shi et al.
Intelligent transportation systems (ITSs) have been fueled by the rapid development of communication technologies, sensor technologies, and the Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of the vehicle networks, it is rather challenging to make timely and accurate decisions of vehicle behaviors. Moreover, in the presence of mobile wireless communications, the privacy and security of vehicle information are at constant risk. In this context, a new paradigm is urgently needed for various applications in dynamic vehicle environments. As a distributed machine learning technology, federated learning (FL) has received extensive attention due to its outstanding privacy protection properties and easy scalability. We conduct a comprehensive survey of the latest developments in FL for ITS. Specifically, we initially research the prevalent challenges in ITS and elucidate the motivations for applying FL from various perspectives. Subsequently, we review existing deployments of FL in ITS across various scenarios, and discuss specific potential issues in object recognition, traffic management, and service providing scenarios. Furthermore, we conduct a further analysis of the new challenges introduced by FL deployment and the inherent limitations that FL alone cannot fully address, including uneven data distribution, limited storage and computing power, and potential privacy and security concerns. We then examine the existing collaborative technologies that can help mitigate these challenges. Lastly, we discuss the open challenges that remain to be addressed in applying FL in ITS and propose several future research directions.
ITOct 4, 2023
Semi-Federated Learning: Convergence Analysis and Optimization of A Hybrid Learning FrameworkJingheng Zheng, Wanli Ni, Hui Tian et al.
Under the organization of the base station (BS), wireless federated learning (FL) enables collaborative model training among multiple devices. However, the BS is merely responsible for aggregating local updates during the training process, which incurs a waste of the computational resource at the BS. To tackle this issue, we propose a semi-federated learning (SemiFL) paradigm to leverage the computing capabilities of both the BS and devices for a hybrid implementation of centralized learning (CL) and FL. Specifically, each device sends both local gradients and data samples to the BS for training a shared global model. To improve communication efficiency over the same time-frequency resources, we integrate over-the-air computation for aggregation and non-orthogonal multiple access for transmission by designing a novel transceiver structure. To gain deep insights, we conduct convergence analysis by deriving a closed-form optimality gap for SemiFL and extend the result to two extra cases. In the first case, the BS uses all accumulated data samples to calculate the CL gradient, while a decreasing learning rate is adopted in the second case. Our analytical results capture the destructive effect of wireless communication and show that both FL and CL are special cases of SemiFL. Then, we formulate a non-convex problem to reduce the optimality gap by jointly optimizing the transmit power and receive beamformers. Accordingly, we propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers. Extensive simulation results on two real-world datasets corroborate our theoretical analysis, and show that the proposed SemiFL outperforms conventional FL and achieves 3.2% accuracy gain on the MNIST dataset compared to state-of-the-art benchmarks.
SPApr 11
Energy-Efficient Hybrid Data Computation via Coordinated AirComp and Edge OffloadingYudan Jiang, Xiao Tang, Jinxin Liu et al.
The development of 6G networks brings an increasing variety of data services, which motivates the hybrid computation paradigm that coordinates the over-the-air computation (AirComp) and edge computing for diverse and effective data processing. In this paper, we address this emerging issue of hybrid data computation from an energy-efficiency perspective, where the coexistence of both types induces resource competition and interference, and thus complicates the network management. Accordingly, we formulate the problem to minimize the overall energy consumption including the data transmission and computation, subject to the offloading capacity and aggregation accuracy. We then propose a block coordinate descent framework that decomposes and solves the subproblems including the user scheduling, power control, and transceiver scaling, which are then iterated towards a coordinated hybrid computation solution. Simulation results confirm that our coordinated approach achieves significant energy savings compared to baseline strategies, demonstrating its effectiveness in creating a well-coordinated and sustainable hybrid computing environment.
NIDec 26, 2022
Beyond 5G Networks: Integration of Communication, Computing, Caching, and ControlMusbahu Mohammed Adam, Liqiang Zhao, Kezhi Wang et al.
In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.
CVMay 27, 2022
LEAF + AIO: Edge-Assisted Energy-Aware Object Detection for Mobile Augmented RealityHaoxin Wang, BaekGyu Kim, Jiang Xie et al.
Today very few deep learning-based mobile augmented reality (MAR) applications are applied in mobile devices because they are significantly energy-guzzling. In this paper, we design an edge-based energy-aware MAR system that enables MAR devices to dynamically change their configurations, such as CPU frequency, computation model size, and image offloading frequency based on user preferences, camera sampling rates, and available radio resources. Our proposed dynamic MAR configuration adaptations can minimize the per frame energy consumption of multiple MAR clients without degrading their preferred MAR performance metrics, such as latency and detection accuracy. To thoroughly analyze the interactions among MAR configurations, user preferences, camera sampling rate, and energy consumption, we propose, to the best of our knowledge, the first comprehensive analytical energy model for MAR devices. Based on the proposed analytical model, we design a LEAF optimization algorithm to guide the MAR configuration adaptation and server radio resource allocation. An image offloading frequency orchestrator, coordinating with the LEAF, is developed to adaptively regulate the edge-based object detection invocations and to further improve the energy efficiency of MAR devices. Extensive evaluations are conducted to validate the performance of the proposed analytical model and algorithms.
LGOct 20, 2023
An Efficient Federated Learning Framework for Training Semantic Communication SystemLoc X. Nguyen, Huy Q. Le, Ye Lin Tun et al.
Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built upon advanced deep learning models whose training performance heavily relies on data availability. Existing studies often make unrealistic assumptions of a readily accessible data source, where in practice, data is mainly created on the client side. Due to privacy and security concerns, the transmission of data is restricted, which is necessary for conventional centralized training schemes. To address this challenge, we explore semantic communication in a federated learning (FL) setting that utilizes client data without leaking privacy. Additionally, we design our system to tackle the communication overhead by reducing the quantity of information delivered in each global round. In this way, we can save significant bandwidth for resource-limited devices and reduce overall network traffic. Finally, we introduce a mechanism to aggregate the global model from clients, called FedLol. Extensive simulation results demonstrate the effectiveness of our proposed technique compared to baseline methods.
ITMay 28
A Comprehensive Survey on Semantic Communication in Non-Terrestrial Networks: Architectures, Methodologies, and ChallengesLoc X. Nguyen, Avi Deb Raha, Huy Q. Le et al.
The sixth-generation wireless networks are envisioned to deliver ubiquitous, seamless, and intelligent connectivity that reaches far beyond the limits of terrestrial infrastructure. Non-terrestrial networks (NTNs) are central to this vision, extending coverage to underserved regions, remote terrain, and disaster zones that terrestrial deployment cannot economically reach. However, NTN architecture faces numerous limitations: severe path loss over long distances, long propagation delays, large and time-varying Doppler shifts, limited visibility windows, and tight on-board energy and computing budgets. Semantic communication (SemCom), which conveys the meaning of data rather than its raw bit-level representation, is unusually well matched to these conditions: extreme compression rate for task-oriented eases bandwidth scarcity, deep joint source-channel coding prevents the cliff effect due to low signal-to-noise ratio, and generative-AI reconstructs content from sparse cues that survive rain-faded or blocked links. This observation, that each NTN limitation maps onto a SemCom property that addresses it, motivates our survey. We first walk through the NTN limitations one by one, pairing each with the SemCom design choices that complement it, then we organize the literature along three axes: the NTN platform, the semantic methodology, and the supporting techniques, and follow this with platform-by-platform deep dives on satellite-centric, UAV/HAPS-centric, and integrated SAGIN systems. The survey concludes by identifying open research problems, gaps in existing standards, and future directions, including the application of foundation models, energy-aware scheduling, and quantum-assisted SemCom for deep space communication.
ITMar 31
Enhanced Channel Estimation for Flexible Intelligent Metasurface-Aided Communication SystemsJinyue Jiang, Jiancheng An, Lu Gan et al.
Flexible intelligent metasurface (FIM) has recently received considerable interest due to its advantage in realizing a better channel condition by dynamically morphing its surface shape. An FIM consists of multiple elements deposited on a flexible substrate. These elements can not only transmit signals, but also adapt their displacements in a direction perpendicular to the FIM surface via an attached controller. In this paper, we consider the channel estimation problem for the uplink of an FIM-enhanced communication system via customizing the orthogonal matching pursuit (OMP) method. Specifically, we formulate an optimization problem of minimizing the column coherence of the measurement matrix by optimizing the FIM's surface shape, subject to the morphing range constraint. Based on the estimated direction of arrival (DOA) and channel gain, we further investigate the signal-to-noise ratio (SNR) improvement in the FIM-enhanced downlink multiple-input single-output (MISO) system. Numerical results demonstrate that an FIM significantly outperforms a conventional rigid uniform planar array (UPA), thereby showing that FIM can substantially improve channel estimation accuracy and achieve SNR improvement, even when using estimated channel parameters.
SPApr 4
The Role of ISAC in 6G Networks: Enabling Next-Generation Wireless SystemsMuhammad Umar Farooq Qaisar, Weijie Yuan, Onur Günlü et al.
The commencement of the sixth-generation (6G) wireless networks represents a fundamental shift in the integration of communication and sensing technologies to support next-generation applications. Integrated sensing and communication (ISAC) is a key concept in this evolution, enabling end-to-end support for both communication and sensing within a unified framework. It enhances spectrum efficiency, reduces latency, and supports diverse use cases, including smart cities, autonomous systems, and perceptive environments. This tutorial provides a comprehensive overview of ISAC's role in 6G networks, beginning with its evolution since 5G and the technical drivers behind its adoption. Core principles and system variations of ISAC are introduced, followed by an in-depth discussion of the enabling technologies that facilitate its practical deployment. The paper further analyzes current research directions to highlight key challenges, open issues, and emerging trends. Design insights and recommendations are also presented to support future development and implementation. This work ultimately tries to address three central questions: Why is ISAC essential for 6G? What innovations does it bring? How will it shape the future of wireless communication?
CVJul 9, 2024Code
Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in TeleoperationZijun Zhan, Yaxian Dong, Yuqing Hu et al.
Integrating low-light image enhancement techniques, in which diffusion-based AI-generated content (AIGC) models are promising, is necessary to enhance nighttime teleoperation. Remarkably, the AIGC model is computation-intensive, thus necessitating the allocation of AIGC tasks to edge servers with ample computational resources. Given the distinct cost of the AIGC model trained with varying-sized datasets and AIGC tasks possessing disparate demand, it is imperative to formulate a differential pricing strategy to optimize the utility of teleoperators and edge servers concurrently. Nonetheless, the pricing strategy formulation is under information asymmetry, i.e., the demand (e.g., the difficulty level of AIGC tasks and their distribution) of AIGC tasks is hidden information to edge servers. Additionally, manually assessing the difficulty level of AIGC tasks is tedious and unnecessary for teleoperators. To this end, we devise a framework of AIGC task allocation assisted by the Vision Language Model (VLM)-empowered contract theory, which includes two components: VLM-empowered difficulty assessment and contract theory-assisted AIGC task allocation. The first component enables automatic and accurate AIGC task difficulty assessment. The second component is capable of formulating the pricing strategy for edge servers under information asymmetry, thereby optimizing the utility of both edge servers and teleoperators. The simulation results demonstrated that our proposed framework can improve the average utility of teleoperators and edge servers by 10.88~12.43% and 1.4~2.17%, respectively. Code and data are available at https://github.com/ZiJun0819/VLM-Contract-Theory.
NIOct 12, 2023
Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative CommunicationsWensheng Lin, Yuna Yan, Lixin Li et al.
This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.
NISep 27, 2024
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement LearningSheikh Salman Hassan, Yu Min Park, Yan Kyaw Tun et al.
In this paper, a novel generative adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association in NTNs. Traditional reinforcement learning (RL) methods for wireless network optimization often rely on manually designed reward functions, which can require extensive parameter tuning. To overcome these limitations, we employ inverse RL (IRL), specifically leveraging the GAIL framework, to automatically learn reward functions without manual design. We augment this framework with an asynchronous federated learning approach, enabling decentralized multi-satellite systems to collaboratively derive optimal policies. The proposed method aims to maximize spectrum efficiency (SE) while meeting minimum information rate requirements for RUEs. To address the non-convex, NP-hard nature of this problem, we combine the many-to-one matching theory with a multi-agent asynchronous federated IRL (MA-AFIRL) framework. This allows agents to learn through asynchronous environmental interactions, improving training efficiency and scalability. The expert policy is generated using the Whale optimization algorithm (WOA), providing data to train the automatic reward function within GAIL. Simulation results show that the proposed MA-AFIRL method outperforms traditional RL approaches, achieving a $14.6\%$ improvement in convergence and reward value. The novel GAIL-driven policy learning establishes a novel benchmark for 6G NTN optimization.
LGJul 31, 2024
Semantic Successive Refinement: A Generative AI-aided Semantic Communication FrameworkKexin Zhang, Lixin Li, Wensheng Lin et al.
Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scenarios. This system leverages deep generative models to establish a new paradigm in SC. Specifically, At the transmitter end, it employs a joint source-channel coding mechanism based on the Swin Transformer for efficient semantic feature extraction and compression. At the receiver end, an advanced Diffusion Model (DM) reconstructs high-quality images from degraded signals, enhancing perceptual details. Additionally, we present a Multi-User Generative Semantic Communication (MU-GSC) system utilizing an asynchronous processing model. This model effectively manages multiple user requests and optimally utilizes system resources for parallel processing. Simulation results on public datasets demonstrate that our generative AI semantic communication systems achieve superior transmission efficiency and enhanced communication content quality across various channel conditions. Compared to CNN-based DeepJSCC, our methods improve the Peak Signal-to-Noise Ratio (PSNR) by 17.75% in Additive White Gaussian Noise (AWGN) channels and by 20.86% in Rayleigh channels.
SPOct 19, 2023
SemantIC: Semantic Interference Cancellation Towards 6G Wireless CommunicationsWensheng Lin, Yuna Yan, Lixin Li et al.
This letter proposes a novel anti-interference technique, semantic interference cancellation (SemantIC), for enhancing information quality towards the sixth-generation (6G) wireless networks. SemantIC only requires the receiver to concatenate the channel decoder with a semantic auto-encoder. This constructs a turbo loop which iteratively and alternately eliminates noise in the signal domain and the semantic domain. From the viewpoint of network information theory, the neural network of the semantic auto-encoder stores side information by training, and provides side information in iterative decoding, as an implementation of the Wyner-Ziv theorem. Simulation results verify the performance improvement by SemantIC without extra channel resource cost.
LGMay 29, 2022
Mixture GAN For Modulation Classification Resiliency Against Adversarial AttacksEyad Shtaiwi, Ahmed El Ouadrhiri, Majid Moradikia et al.
Automatic modulation classification (AMC) using the Deep Neural Network (DNN) approach outperforms the traditional classification techniques, even in the presence of challenging wireless channel environments. However, the adversarial attacks cause the loss of accuracy for the DNN-based AMC by injecting a well-designed perturbation to the wireless channels. In this paper, we propose a novel generative adversarial network (GAN)-based countermeasure approach to safeguard the DNN-based AMC systems against adversarial attack examples. GAN-based aims to eliminate the adversarial attack examples before feeding to the DNN-based classifier. Specifically, we have shown the resiliency of our proposed defense GAN against the Fast-Gradient Sign method (FGSM) algorithm as one of the most potent kinds of attack algorithms to craft the perturbed signals. The existing defense-GAN has been designed for image classification and does not work in our case where the above-mentioned communication system is considered. Thus, our proposed countermeasure approach deploys GANs with a mixture of generators to overcome the mode collapsing problem in a typical GAN facing radio signal classification problem. Simulation results show the effectiveness of our proposed defense GAN so that it could enhance the accuracy of the DNN-based AMC under adversarial attacks to 81%, approximately.
LGAug 22, 2024
Human-In-The-Loop Machine Learning for Safe and Ethical Autonomous Vehicles: Principles, Challenges, and OpportunitiesYousef Emami, Luis Almeida, Kai Li et al.
Rapid advances in Machine Learning (ML) have triggered new trends in Autonomous Vehicles (AVs). ML algorithms play a crucial role in interpreting sensor data, predicting potential hazards, and optimizing navigation strategies. However, achieving full autonomy in cluttered and complex situations, such as intricate intersections, diverse sceneries, varied trajectories, and complex missions, is still challenging, and the cost of data labeling remains a significant bottleneck. The adaptability and robustness of humans in complex scenarios motivate the inclusion of humans in the ML process, leveraging their creativity, ethical power, and emotional intelligence to improve ML effectiveness. The scientific community knows this approach as Human-In-The-Loop Machine Learning (HITL-ML). Towards safe and ethical autonomy, we present a review of HITL-ML for AVs, focusing on Curriculum Learning (CL), Human-In-The-Loop Reinforcement Learning (HITL-RL), Active Learning (AL), and ethical principles. In CL, human experts systematically train ML models by starting with simple tasks and gradually progressing to more difficult ones. HITL-RL significantly enhances the RL process by incorporating human input through techniques like reward shaping, action injection, and interactive learning. AL streamlines the annotation process by targeting specific instances that need to be labeled with human oversight, reducing the overall time and cost associated with training. Ethical principles must be embedded in AVs to align their behavior with societal values and norms. In addition, we provide insights and specify future research directions.
CVAug 8, 2024
Semantic Communication based on Large Language Model for Underwater Image TransmissionWeilong Chen, Wenxuan Xu, Haoran Chen et al.
Underwater communication is essential for environmental monitoring, marine biology research, and underwater exploration. Traditional underwater communication faces limitations like low bandwidth, high latency, and susceptibility to noise, while semantic communication (SC) offers a promising solution by focusing on the exchange of semantics rather than symbols or bits. However, SC encounters challenges in underwater environments, including semantic information mismatch and difficulties in accurately identifying and transmitting critical information that aligns with the diverse requirements of underwater applications. To address these challenges, we propose a novel Semantic Communication (SC) framework based on Large Language Models (LLMs). Our framework leverages visual LLMs to perform semantic compression and prioritization of underwater image data according to the query from users. By identifying and encoding key semantic elements within the images, the system selectively transmits high-priority information while applying higher compression rates to less critical regions. On the receiver side, an LLM-based recovery mechanism, along with Global Vision ControlNet and Key Region ControlNet networks, aids in reconstructing the images, thereby enhancing communication efficiency and robustness. Our framework reduces the overall data size to 0.8\% of the original. Experimental results demonstrate that our method significantly outperforms existing approaches, ensuring high-quality, semantically accurate image reconstruction.
DCMar 10
Hierarchical Observe-Orient-Decide-Act Enabled UAV Swarms in Uncertain Environments: Frameworks, Potentials, and ChallengesZiye Jia, Yao Wu, Qihui Wu et al.
Unmanned aerial vehicle (UAV) swarms are increasingly explored for their potentials in various applications such as surveillance, disaster response, and military. However, UAV swarms face significant challenges of implementing effective and rapid decisions under dynamic and uncertain environments. The traditional decision-making frameworks, mainly relying on centralized control and rigid architectures, are limited by their adaptability and scalability especially in complex environments. To overcome these challenges, in this paper, we propose a hierarchical Observe-Orient-Decide-Act (H-OODA) loop based framework for the UAV swarm operation in uncertain environments, which is implemented by embedding the classical OODA loop across the cloud-edge-terminal layers, and leveraging the network function virtualization (NFV) technology to provide flexible and scalable decision-making functions. In addition, based on the proposed H-OODA framework, we joint autonomous decision-making and cooperative control to enhance the adaptability and efficiency of UAV swarms. Furthermore, we present some typical case studies to verify the improvement and efficiency of the proposed framework. Finally, the potential challenges and possible directions are analyzed to provide insights for the future H-OODA enabled UAV swarms.
NIMar 12
Efficient Cross-View Localization in 6G Space-Air-Ground Integrated NetworkMin Hao, Yanbing Xu, Maoqiang Wu et al.
Recently, visual localization has become an important supplement to improve localization reliability, and cross-view approaches can greatly enhance coverage and adaptability. Meanwhile, future 6G will enable a globally covered mobile communication system, with a space-air-ground integrated network (SAGIN) serving as key supporting architecture. Inspired by this, we explore an integration of cross-view localization (CVL) with 6G SAGIN, thereby enhancing its performance in latency, energy consumption, and privacy protection. First, we provide a comprehensive review of CVL and SAGIN, highlighting their capabilities, integration opportunities, and potential applications. Benefiting from the fast and extensive image collection and transmission capabilities of the 6G SAGIN architecture, CVL achieves higher localization accuracy and faster processing speed. Then, we propose a split-inference framework for implementing CVL, which fully leverages the distributed communication and computing resources of the 6G SAGIN architecture. Subsequently, we conduct joint optimization of communication, computation, and confidentiality within the proposed split-inference framework, aiming to provide a paradigm and a direction for making CVL efficient. Experimental results validate the effectiveness of the proposed framework and provide solutions to the optimization problem. Finally, we discuss potential research directions for 6G SAGIN-enabled CVL.
IVDec 5, 2024Code
Dual-Branch Subpixel-Guided Network for Hyperspectral Image ClassificationZhu Han, Jin Yang, Lianru Gao et al.
Deep learning (DL) has been widely applied into hyperspectral image (HSI) classification owing to its promising feature learning and representation capabilities. However, limited by the spatial resolution of sensors, existing DL-based classification approaches mainly focus on pixel-level spectral and spatial information extraction through complex network architecture design, while ignoring the existence of mixed pixels in actual scenarios. To tackle this difficulty, we propose a novel dual-branch subpixel-guided network for HSI classification, called DSNet, which automatically integrates subpixel information and convolutional class features by introducing a deep autoencoder unmixing architecture to enhance classification performance. DSNet is capable of fully considering physically nonlinear properties within subpixels and adaptively generating diagnostic abundances in an unsupervised manner to achieve more reliable decision boundaries for class label distributions. The subpixel fusion module is designed to ensure high-quality information fusion across pixel and subpixel features, further promoting stable joint classification. Experimental results on three benchmark datasets demonstrate the effectiveness and superiority of DSNet compared with state-of-the-art DL-based HSI classification approaches. The codes will be available at https://github.com/hanzhu97702/DSNet, contributing to the remote sensing community.
ITApr 14
Anchor-Aided Multi-User Semantic Communication with Adaptive DecodersLoc X. Nguyen, Phuong-Nam Tran, Trung Thanh Pham et al.
Semantic communication (SemCom) is accelerating its momentum to catch up with the massive increase in users' demands in both quantity and quality, with the assistance of advanced deep learning (DL) techniques. Specifically, SemCom can actively embed the semantic meaning of the data into the transmission process, while eliminating statistical redundancy to preserve bandwidth resources for other users. Therefore, the transmitter encodes the message in the most concise way, while the receiver tries to interpret the message with the DL model and its knowledge of the transmitter's intended meaning. Most existing works only consider one transmitter and one receiver, which limits their ability to address the diversity in users' models and capabilities. Therefore, in this paper, we propose a multi-user semantic communication system where each user is equipped with a distinct DL-based joint source-channel decoder architecture, reflecting the diversity in computing capacity. The challenging issue with the proposed system is the catastrophic forgetting property of neural networks, where the DL-based encoder fails to encode the data for the previous user when being trained with a new user. To address this, we propose an anchor decoder with an architecture that is symmetric to the encoder. The symmetric decoder has the same computational capacity as the encoder, providing feedback that aligns with the encoder's extraction capabilities and enhances optimization efficiency. The parameters of the optimized encoder are then frozen and used to train decoders for various users, aligning them with the encoder outputs. Finally, we conduct a series of simulation experiments to validate the proposed framework against other benchmarks.
LGMar 10
A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing SystemLuyao Zou, Hayoung Oh, Chu Myaet Thwal et al.
With the development of wireless network, Multi-Access Edge Computing (MEC) and Artificial Intelligence (AI)-native Radio Access Network (RAN) have attracted significant attention. Particularly, the integration of AI-RAN and MEC is envisioned to transform network efficiency and responsiveness. Therefore, it is valuable to investigate AI-RAN enabled MEC system. Federated learning (FL) nowadays is emerging as a promising approach for AI-RAN enabled MEC system, in which edge devices are enabled to train a global model cooperatively without revealing their raw data. However, conventional FL encounters the challenge in processing the non-independent and identically distributed (non-IID) data. Single prototype obtained by averaging the embedding vectors per class can be employed in FL to handle the data heterogeneity issue. Nevertheless, this may result in the loss of useful information owing to the average operation. Therefore, in this paper, a multi-prototype-guided federated knowledge distillation (MP-FedKD) approach is proposed. Particularly, self-knowledge distillation is integrated into FL to deal with the non-IID issue. To cope with the problem of information loss caused by single prototype-based strategy, multi-prototype strategy is adopted, where we present a conditional hierarchical agglomerative clustering (CHAC) approach and a prototype alignment scheme. Additionally, we design a novel loss function (called LEMGP loss) for each local client, where the relationship between global prototypes and local embedding will be focused. Extensive experiments over multiple datasets with various non-IID settings showcase that the proposed MP-FedKD approach outperforms the considered state-of-the-art baselines regarding accuracy, average accuracy and errors (RMSE and MAE).
AIJul 31, 2024
FSSC: Federated Learning of Transformer Neural Networks for Semantic Image CommunicationYuna Yan, Xin Zhang, Lixin Li et al.
In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next, the FL framework is introduced to collaboratively learn a global model by aggregating local model parameters, rather than directly sharing clients' data. This approach enhances user privacy protection and reduces the workload on the server or mobile edge. Simulation evaluations indicate that our method outperforms the typical JSCC algorithm and traditional separate-based communication algorithms. Particularly after integrating local semantics, the global aggregation model has further increased the Peak Signal-to-Noise Ratio (PSNR) by more than 2dB, thoroughly proving the effectiveness of our algorithm.
CVMar 10
Agentic AI as a Network Control-Plane Intelligence Layer for Federated Learning over 6GLoc X. Nguyen, Ji Su Yoon, Huy Q. Le et al.
The shift toward user-customized on-device learning places new demands on wireless systems: models must be trained on diverse, distributed data while meeting strict latency, bandwidth, and reliability constraints. To address this, we propose an Agentic AI as the control layer for managing federated learning (FL) over 6G networks, which translates high-level task goals into actions that are aware of network conditions. Rather than simply viewing FL as a learning challenge, our system sees it as a combined task of learning and network management. A set of specialized agents focused on retrieval, planning, coding, and evaluation utilizes monitoring tools and optimization methods to handle client selection, incentive structuring, scheduling, resource allocation, adaptive local training, and code generation. The use of closed-loop evaluation and memory allows the system to consistently refine its decisions, taking into account varying signal-to-noise ratios, bandwidth conditions, and device capabilities. Finally, our case study has demonstrated the effectiveness of the Agentic AI system's use of tools for achieving high performance.
NIAug 13, 2024
IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and OptimizationGuanchang Li, Wensheng Lin, Lixin Li et al.
This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of the simulations conducted to evaluate the performance of the DRL-based method. The simulation results indicate that the outage probability of the considered system increases significantly with more correlated channel coefficients. Moreover, the performance gap between DRL and theoretical limit increases with higher transmit power and/or larger distortion requirement.
NIMar 25
Spatio-Temporal Semantic Inference for Resilient 6G HRLLC in the Low-Altitude EconomyChuan-Chi Lai, Ang-Hsun Tsai, Zhu Han
The rapid expansion of the Low-Altitude Economy (LAE) necessitates highly reliable coordination among autonomous aerial agents (AAAs). Traditional reactive communication paradigms in 6G networks are increasingly susceptible to stochastic network jitter and intermittent signaling silence, especially within complex urban canyon environments. To address this connectivity gap, this paper introduces the Embodied Proactive Inference for Coordination (EPIC) framework, featuring a Spatio-Temporal Semantic Inference (STSI) operator designed to decouple the coordination loop from physical signaling fluctuations. By projecting stale peer observations into a proactive belief manifold, EPIC maintains a deterministic reaction latency regardless of the network state. Extensive simulations demonstrate that EPIC achieves an average 93.5% reduction in end-to-end reaction latency, masking physical transmission delays of 150 ms with a deterministic 10 ms execution heartbeat. Crucially, EPIC exhibits strategic immunity to escalating network jitter up to 100 ms and improves the Weighted Coverage Efficiency (WCE) by 10.5% during extreme signaling silence lasting up to 50 s. These results provide the deterministic resilience essential for 6G Hyper-Reliable and Low-Latency Communication (HRLLC).
ROFeb 23
Large Language Model-Assisted UAV Operations and Communications: A Multifaceted Survey and TutorialYousef Emami, Hao Zhou, Radha Reddy et al.
Uncrewed Aerial Vehicles (UAVs) are widely deployed across diverse applications due to their mobility and agility. Recent advances in Large Language Models (LLMs) offer a transformative opportunity to enhance UAV intelligence beyond conventional optimization-based and learning-based approaches. By integrating LLMs into UAV systems, advanced environmental understanding, swarm coordination, mobility optimization, and high-level task reasoning can be achieved, thereby allowing more adaptive and context-aware aerial operations. This survey systematically explores the intersection of LLMs and UAV technologies and proposes a unified framework that consolidates existing architectures, methodologies, and applications for UAVs. We first present a structured taxonomy of LLM adaptation techniques for UAVs, including pretraining, fine-tuning, Retrieval-Augmented Generation (RAG), and prompt engineering, along with key reasoning capabilities such as Chain-of-Thought (CoT) and In-Context Learning (ICL). We then examine LLM-assisted UAV communications and operations, covering navigation, mission planning, swarm control, safety, autonomy, and network management. After that, the survey further discusses Multimodal LLMs (MLLMs) for human-swarm interaction, perception-driven navigation, and collaborative control. Finally, we address ethical considerations, including bias, transparency, accountability, and Human-in-the-Loop (HITL) strategies, and outline future research directions. Overall, this work positions LLM-assisted UAVs as a foundation for intelligent and adaptive aerial systems.
SPMay 14
Deep Mixture of Experts Network for Resource Optimization in Aerial-Terrestrial CF-mMIMO Systems under URLLCDonggen Li, Chong Huang, Jingfu Li et al.
As a critical component of sixth-generation (6G) wireless networks, ultra-reliable and low-latency communication (URLLC) is expected to support real-time and reliable information exchange in low-altitude environments. However, achieving URLLC often incurs significant resource overhead, including increased bandwidth consumption, higher transmit power, and denser access point (AP) deployment, which pose significant challenges to both spectral efficiency (SE) and energy efficiency (EE). Besides, existing iterative optimization algorithms are computationally intensive and struggle to meet the latency requirements of URLLC. To address these challenges, we propose a hybrid aerial-terrestrial cell-free massive MIMO (CF-mMIMO) network to support diverse services, along with a channel prediction network and a deep mixture of experts (MoE) network for uplink optimization. First, we design a channel prediction network (CP-Net) to mitigate channel aging caused by high-mobility user equipment (UE). CP-Net employs three Transformer-based sub-networks for aged channel state information (CSI) prediction, while a channel quality-aware loss function is introduced to improve the prediction accuracy of weak links. Based on the predicted CSI, we develop a deep MoE network (MoE-Net) for power allocation comprising three expert models targeting different objectives. Then, we introduce a weighted gating network (WT-Net) to learn an efficient adaptive combination of expert outputs. The proposed framework better captures heterogeneous UE requirements and improves communication performance under URLLC constraints. Numerical results demonstrate the effectiveness of the proposed method.
IVMay 12
On Privacy-Preserving Image Transmission in Low-Altitude Networks: A Swin Transformer-Based Framework with Federated LearningKexin Zhang, Lixin Li, Yuna Yan et al.
The rapid development of low-altitude economy has driven the proliferation of Unmanned Aerial Vehicle (UAV) applications, including logistics, inspection, and emergency response. However, transmitting high-volume image data from UAVs to ground stations faces significant challenges due to limited bandwidth and stringent privacy requirements. To address these issues, a Semantic Communication (SC) framework based on Federated Learning (FL) is proposed for efficient and privacy-preserving image transmission. A Swin Transformer-based Semantic Communication (STSC) architecture is designed to extract multi-scale semantic features under constrained bandwidth conditions. Dedicated communication and computing nodes are deployed on UAVs to enhance real-time coverage and flexibility. Meanwhile, a FL mechanism enables global model training across distributed devices without sharing raw data, thus preserving user privacy. Simulation experiments conducted on the CIFAR-10 dataset demonstrate that the proposed STSC framework achieves at least 5.7 dB improvement in Peak Signal-to-Noise Ratio (PSNR) compared to DeepJSCC baselines, while also showing superior convergence and generalization performance. The framework effectively integrates UAV-assisted deployment with SC and privacy protection, offering a practical solution for bandwidth-constrained image transmission in low-altitude networks.
ITMay 12
Resolution Information: Limits of Ambiguity Resolution for Generative CommunicationAngeles Vazquez-Castro, Faheem Dustin Quazi, Zhu Han
In generative communication, the transmitter sends a compact generative description, such as model parameters or a latent representation, rather than raw data. The receiver uses this description to form a posterior belief over the underlying state and to resolve semantic ambiguity: which interpretation, decision, or action is supported by the received representation? Inspired by Shannon's geometric view of communication as uncertainty resolution, we introduce resolution information as the minimum information update, measured in nats, required to move the receiver's posterior belief into a low-ambiguity semantic region. Our work yields three main results. First, when the receiver can form any posterior belief, corresponding to the ideal unconstrained case, resolution information reduces to a binary divergence that depends only on each region's prior probability. In this case, the shape of the regions is irrelevant. Under repeated sampling, ambiguity decays exponentially with an exponent equal to the resolution information, giving it an operational meaning as an ambiguity exponent. Second, when the generative representation constrains the posterior family, as in practice, geometry becomes operational and can create irreducible ambiguity floors: half-spaces remain resolvable, whereas polytope-type regions can exhibit residual ambiguity that no amount of additional information can remove. These results reveal a fundamental departure from classical channel coding. In Shannon theory, codes can be designed so that decoding regions separate messages and error probability vanishes below capacity. In generative communication, the model itself induces a constrained posterior geometry that may prevent asymptotic ambiguity resolution. The resulting limit is not on rate, but on resolvability itself.
CEMay 11
Matching-with-Contracts for the AI-RAN Market: AIGC-as-a-Service for TeleoperationZijun Zhan, Yaxian Dong, Daniel Mawunyo Doe et al.
Artificial intelligence radio access networks (AI-RANs) are a promising architecture for bolstering the prosperity of the edge AI ecosystem. A well-designed incentive mechanism can further ensure the sustainable development of this ecosystem. However, incentive mechanism design faces two major challenges: 1) information asymmetry, where AI-RAN operators have only partial knowledge of AI users' utility functions, and 2) competition, as multiple AI-RAN operators coexist in real-world markets. Remarkably, chaotic and adversarial competition might compromise AI-RAN operators' utility. To this end, we develop a matching-with-contracts framework for incentive mechanism design in AI-RAN service markets. The framework extends the static matching-with-contracts model by jointly characterizing the contract design of multiple competitive operators, user-operator matching, and dynamic evolution of the market state. Specifically, the incentive mechanism offered by each AI-RAN operator takes the form of a contract menu, where each contract item consists of an AI service latency agreement and a corresponding price. We model the AI service process as three independent queues and characterize the violation probability of the latency agreement using queueing theory and the Chernoff bound. To derive an effective incentive mechanism, we further propose a mixed stable matching-with-contracts algorithm that jointly updates user-side matching decisions and operator-side contract menus. Simulation results for a teleoperation-oriented AIGC service demonstrate the effectiveness and robustness of the proposed method. Compared with benchmark schemes, our method improves the total utility of AI-RAN operators by at least 56.8\% under representative settings.
SDDec 4, 2025
Large Speech Model Enabled Semantic CommunicationYun Tian, Zhijin Qin, Guocheng Lv et al.
Existing speech semantic communication systems mainly based on Joint Source-Channel Coding (JSCC) architectures have demonstrated impressive performance, but their effectiveness remains limited by model structures specifically designed for particular tasks and datasets. Recent advances indicate that generative large models pre-trained on massive datasets, can achieve outstanding performance arexhibit exceptional performance across diverse downstream tasks with minimal fine-tuning. To exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels, we propose a Large Speech Model enabled Semantic Communication (LargeSC) system. Simultaneously achieving adaptive compression and robust transmission over lossy channels remains challenging, requiring trade-offs among compression efficiency, speech quality, and latency. In this work, we employ the Mimi as a speech codec, converting speech into discrete tokens compatible with existing network architectures. We propose an adaptive controller module that enables adaptive transmission and in-band Unequal Error Protection (UEP), dynamically adjusting to both speech content and packet loss probability under bandwidth constraints. Additionally, we employ Low-Rank Adaptation (LoRA) to finetune the Moshi foundation model for generative recovery of lost speech tokens. Simulation results show that the proposed system supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates and achieves an end-to-end latency of approximately 460 ms, thereby demonstrating its potential for real-time deployment.
CVJul 7, 2021Code
SpectralFormer: Rethinking Hyperspectral Image Classification with TransformersDanfeng Hong, Zhu Han, Jing Yao et al.
Hyperspectral (HS) images are characterized by approximately contiguous spectral information, enabling the fine identification of materials by capturing subtle spectral discrepancies. Owing to their excellent locally contextual modeling ability, convolutional neural networks (CNNs) have been proven to be a powerful feature extractor in HS image classification. However, CNNs fail to mine and represent the sequence attributes of spectral signatures well due to the limitations of their inherent network backbone. To solve this issue, we rethink HS image classification from a sequential perspective with transformers, and propose a novel backbone network called \ul{SpectralFormer}. Beyond band-wise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding group-wise spectral embeddings. More significantly, to reduce the possibility of losing valuable information in the layer-wise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow to deep layers by adaptively learning to fuse "soft" residuals across layers. It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixel- and patch-wise inputs. We evaluate the classification performance of the proposed SpectralFormer on three HS datasets by conducting extensive experiments, showing the superiority over classic transformers and achieving a significant improvement in comparison with state-of-the-art backbone networks. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_SpectralFormer for the sake of reproducibility.
AINov 23, 2023
Scalable AI Generative Content for Vehicular Network Semantic CommunicationHao Feng, Yi Yang, Zhu Han
Perceiving vehicles in a driver's blind spot is vital for safe driving. The detection of potentially dangerous vehicles in these blind spots can benefit from vehicular network semantic communication technology. However, efficient semantic communication involves a trade-off between accuracy and delay, especially in bandwidth-limited situations. This paper unveils a scalable Artificial Intelligence Generated Content (AIGC) system that leverages an encoder-decoder architecture. This system converts images into textual representations and reconstructs them into quality-acceptable images, optimizing transmission for vehicular network semantic communication. Moreover, when bandwidth allows, auxiliary information is integrated. The encoder-decoder aims to maintain semantic equivalence with the original images across various tasks. Then the proposed approach employs reinforcement learning to enhance the reliability of the generated contents. Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data. While this method is specifically designed for driving scenarios, this encoder-decoder architecture also holds potential for wide use across various semantic communication scenarios.
AIJan 15, 2024
When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and AlignmentMinrui Xu, Dusit Niyato, Jiawen Kang et al.
AI agents based on multimodal large language models (LLMs) are expected to revolutionize human-computer interaction and offer more personalized assistant services across various domains like healthcare, education, manufacturing, and entertainment. Deploying LLM agents in 6G networks enables users to access previously expensive AI assistant services via mobile devices democratically, thereby reducing interaction latency and better preserving user privacy. Nevertheless, the limited capacity of mobile devices constrains the effectiveness of deploying and executing local LLMs, which necessitates offloading complex tasks to global LLMs running on edge servers during long-horizon interactions. In this article, we propose a split learning system for LLM agents in 6G networks leveraging the collaboration between mobile devices and edge servers, where multiple LLMs with different roles are distributed across mobile devices and edge servers to perform user-agent interactive tasks collaboratively. In the proposed system, LLM agents are split into perception, grounding, and alignment modules, facilitating inter-module communications to meet extended user requirements on 6G network functions, including integrated sensing and communication, digital twins, and task-oriented communications. Furthermore, we introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context, thus reducing network costs of the collaborative mobile and edge LLM agents.
AIJan 27
CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient RoutingShanyv Liu, Xuyang Yuan, Tao Chen et al.
Graph-based Multi-Agent Systems (MAS) enable complex cyclic workflows but suffer from inefficient static model allocation, where deploying strong models uniformly wastes computation on trivial sub-tasks. We propose CASTER (Context-Aware Strategy for Task Efficient Routing), a lightweight router for dynamic model selection in graph-based MAS. CASTER employs a Dual-Signal Router that combines semantic embeddings with structural meta-features to estimate task difficulty. During training, the router self-optimizes through a Cold Start to Iterative Evolution paradigm, learning from its own routing failures via on-policy negative feedback. Experiments using LLM-as-a-Judge evaluation across Software Engineering, Data Analysis, Scientific Discovery, and Cybersecurity demonstrate that CASTER reduces inference cost by up to 72.4% compared to strong-model baselines while matching their success rates, and consistently outperforms both heuristic routing and FrugalGPT across all domains.
AIApr 29, 2024
Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6GWalid Saad, Omar Hashash, Christo Kurisummoottil Thomas et al.
Building future wireless systems that support services like digital twins (DTs) is challenging to achieve through advances to conventional technologies like meta-surfaces. While artificial intelligence (AI)-native networks promise to overcome some limitations of wireless technologies, developments still rely on AI tools like neural networks. Such tools struggle to cope with the non-trivial challenges of the network environment and the growing demands of emerging use cases. In this paper, we revisit the concept of AI-native wireless systems, equipping them with the common sense necessary to transform them into artificial general intelligence (AGI)-native systems. These systems acquire common sense by exploiting different cognitive abilities such as perception, analogy, and reasoning, that enable them to generalize and deal with unforeseen scenarios. Towards developing the components of such a system, we start by showing how the perception module can be built through abstracting real-world elements into generalizable representations. These representations are then used to create a world model, founded on principles of causality and hyper-dimensional (HD) computing, that aligns with intuitive physics and enables analogical reasoning, that define common sense. Then, we explain how methods such as integrated information theory play a role in the proposed intent-driven and objective-driven planning methods that maneuver the AGI-native network to take actions. Next, we discuss how an AGI-native network can enable use cases related to human and autonomous agents: a) analogical reasoning for next-generation DTs, b) synchronized and resilient experiences for cognitive avatars, and c) brain-level metaverse experiences like holographic teleportation. Finally, we conclude with a set of recommendations to build AGI-native systems. Ultimately, we envision this paper as a roadmap for the beyond 6G era.
CVNov 11, 2025
Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image ClassificationAnh Mai Vu, Tuan L. Vo, Ngoc Lam Quang Bui et al.
Interpretability is essential in Whole Slide Image (WSI) analysis for computational pathology, where understanding model predictions helps build trust in AI-assisted diagnostics. While Integrated Gradients (IG) and related attribution methods have shown promise, applying them directly to WSIs introduces challenges due to their high-resolution nature. These methods capture model decision patterns but may overlook class-discriminative signals that are crucial for distinguishing between tumor subtypes. In this work, we introduce Contrastive Integrated Gradients (CIG), a novel attribution method that enhances interpretability by computing contrastive gradients in logit space. First, CIG highlights class-discriminative regions by comparing feature importance relative to a reference class, offering sharper differentiation between tumor and non-tumor areas. Second, CIG satisfies the axioms of integrated attribution, ensuring consistency and theoretical soundness. Third, we propose two attribution quality metrics, MIL-AIC and MIL-SIC, which measure how predictive information and model confidence evolve with access to salient regions, particularly under weak supervision. We validate CIG across three datasets spanning distinct cancer types: CAMELYON16 (breast cancer metastasis in lymph nodes), TCGA-RCC (renal cell carcinoma), and TCGA-Lung (lung cancer). Experimental results demonstrate that CIG yields more informative attributions both quantitatively, using MIL-AIC and MIL-SIC, and qualitatively, through visualizations that align closely with ground truth tumor regions, underscoring its potential for interpretable and trustworthy WSI-based diagnostics
DCMay 18, 2024
Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and FrameworkZiye Jia, Jiahao You, Chao Dong et al.
As the demands for immediate and effective responses increase in both civilian and military domains, the unmanned aerial vehicle (UAV) swarms emerge as effective solutions, in which multiple cooperative UAVs can work together to achieve specific goals. However, how to manage such complex systems to ensure real-time adaptability lack sufficient researches. Hence, in this paper, we propose the cooperative cognitive dynamic system (CCDS), to optimize the management for UAV swarms. CCDS leverages a hierarchical and cooperative control structure that enables real-time data processing and decision. Accordingly, CCDS optimizes the UAV swarm management via dynamic reconfigurability and adaptive intelligent optimization. In addition, CCDS can be integrated with the biomimetic mechanism to efficiently allocate tasks for UAV swarms. Further, the distributed coordination of CCDS ensures reliable and resilient control, thus enhancing the adaptability and robustness. Finally, the potential challenges and future directions are analyzed, to provide insights into managing UAV swarms in dynamic heterogeneous networking.
CVDec 11, 2025
ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology SegmentationKhang Le, Ha Thach, Anh M. Vu et al.
Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones, yet these models often localize only the most discriminative regions and struggle to capture the full spatial extent of tissue structures. Vision-language models such as CONCH offer rich semantic alignment and morphology-aware representations, while modern segmentation backbones like SegFormer preserve fine-grained spatial cues. However, combining these complementary strengths remains challenging, especially under weak supervision and without dense annotations. We propose a prototype learning framework for WSSS in histopathological images that integrates morphology-aware representations from CONCH, multi-scale structural cues from SegFormer, and text-guided semantic alignment to produce prototypes that are simultaneously semantically discriminative and spatially coherent. To effectively leverage these heterogeneous sources, we introduce text-guided prototype initialization that incorporates pathology descriptions to generate more complete and semantically accurate pseudo-masks. A structural distillation mechanism transfers spatial knowledge from SegFormer to preserve fine-grained morphological patterns and local tissue boundaries during prototype learning. Our approach produces high-quality pseudo masks without pixel-level annotations, improves localization completeness, and enhances semantic consistency across tissue types. Experiments on BCSS-WSSS datasets demonstrate that our prototype learning framework outperforms existing WSSS methods while remaining computationally efficient through frozen foundation model backbones and lightweight trainable adapters.
CVDec 11, 2025
DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image SegmentationAnh M. Vu, Khang P. Le, Trang T. K. Vo et al.
Weakly supervised semantic segmentation (WSSS) in histopathology seeks to reduce annotation cost by learning from image-level labels, yet it remains limited by inter-class homogeneity, intra-class heterogeneity, and the region-shrinkage effect of CAM-based supervision. We propose a simple and effective prototype-driven framework that leverages vision-language alignment to improve region discovery under weak supervision. Our method integrates CoOp-style learnable prompt tuning to generate text-based prototypes and combines them with learnable image prototypes, forming a dual-modal prototype bank that captures both semantic and appearance cues. To address oversmoothing in ViT representations, we incorporate a multi-scale pyramid module that enhances spatial precision and improves localization quality. Experiments on the BCSS-WSSS benchmark show that our approach surpasses existing state-of-the-art methods, and detailed analyses demonstrate the benefits of text description diversity, context length, and the complementary behavior of text and image prototypes. These results highlight the effectiveness of jointly leveraging textual semantics and visual prototype learning for WSSS in digital pathology.
NIFeb 24, 2025
Toward Agentic AI: Generative Information Retrieval Inspired Intelligent Communications and NetworkingRuichen Zhang, Shunpu Tang, Yinqiu Liu et al.
The increasing complexity and scale of modern telecommunications networks demand intelligent automation to enhance efficiency, adaptability, and resilience. Agentic AI has emerged as a key paradigm for intelligent communications and networking, enabling AI-driven agents to perceive, reason, decide, and act within dynamic networking environments. However, effective decision-making in telecom applications, such as network planning, management, and resource allocation, requires integrating retrieval mechanisms that support multi-hop reasoning, historical cross-referencing, and compliance with evolving 3GPP standards. This article presents a forward-looking perspective on generative information retrieval-inspired intelligent communications and networking, emphasizing the role of knowledge acquisition, processing, and retrieval in agentic AI for telecom systems. We first provide a comprehensive review of generative information retrieval strategies, including traditional retrieval, hybrid retrieval, semantic retrieval, knowledge-based retrieval, and agentic contextual retrieval. We then analyze their advantages, limitations, and suitability for various networking scenarios. Next, we present a survey about their applications in communications and networking. Additionally, we introduce an agentic contextual retrieval framework to enhance telecom-specific planning by integrating multi-source retrieval, structured reasoning, and self-reflective validation. Experimental results demonstrate that our framework significantly improves answer accuracy, explanation consistency, and retrieval efficiency compared to traditional and semantic retrieval methods. Finally, we outline future research directions.
NIApr 28, 2024
Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language ModelsJinbo Wen, Ruichen Zhang, Dusit Niyato et al.
By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excellent reasoning and generation capabilities. In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT. Specifically, we first study the main impacts that cause carbon emissions in AIoT, and then introduce GAI techniques and their relations to carbon emissions. We then explore the application prospects of GAI in low-carbon AIoT, focusing on how GAI can reduce carbon emissions of network components. Subsequently, we propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules to generate more accurate and reliable optimization problems. Furthermore, we utilize Generative Diffusion Models (GDMs) to identify optimal strategies for carbon emission reduction. Numerical results demonstrate the effectiveness of the proposed framework. Finally, we insightfully provide open research directions for low-carbon AIoT.
LGApr 8
A Novel Edge-Assisted Quantum-Classical Hybrid Framework for Crime Pattern Learning and ClassificationNiloy Das, Apurba Adhikary, Sheikh Salman Hassan et al.
Crime pattern analysis is critical for law enforcement and predictive policing, yet the surge in criminal activities from rapid urbanization creates high-dimensional, imbalanced datasets that challenge traditional classification methods. This study presents a quantum-classical comparison framework for crime analytics, evaluating four computational paradigms: quantum models, classical baseline machine learning models, and two hybrid quantum-classical architectures. Using 16-year Bangladesh crime statistics, we systematically assess classification performance and computational efficiency under rigorous cross-validation methods. Experimental results show that quantum-inspired approaches, particularly QAOA, achieve up to 84.6% accuracy, while requiring fewer trainable parameters than classical baselines, suggesting practical advantages for memory-constrained edge deployment. The proposed correlation-aware circuit design demonstrates the potential of incorporating domain-specific feature relationships into quantum models. Furthermore, hybrid approaches exhibit competitive training efficiency, making them suitable candidates for resource-constrained environments. The framework's low computational overhead and compact parameter footprint suggest potential advantages for wireless sensor network deployments in smart city surveillance systems, where distributed nodes perform localized crime analytics with minimal communication costs. Our findings provide a preliminary empirical assessment of quantum-enhanced machine learning for structured crime data and motivate further investigation with larger datasets and realistic quantum hardware considerations.
NIMar 6, 2025
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital ExperiencesAdnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi et al.
This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.