NIJun 4
Compact LLM Deployment and World Model Assisted Offloading in Mobile Edge ComputingRuichen Zhang, Xiaofeng Luo, Jiayi He et al.
This paper investigates compact large language model (LLM) deployment and world-model-assisted inference offloading in mobile edge computing (MEC) networks. We first propose an edge compact LLM deployment (ECLD) framework that jointly applies structured pruning, low-bit quantization, and knowledge distillation to construct edge-deployable LLM variants, and we evaluate these models using four complementary metrics: accessibility, energy consumption, hallucination rate, and generalization accuracy. Building on the resulting compact models, we formulate an MEC offloading optimization problem that minimizes the long-term average inference latency subject to per-device energy budgets and LLM-specific quality-of-service constraints on effective accuracy and hallucination. To solve this problem under unknown and time-varying network dynamics, we develop a world model-proximal policy optimization (PPO) algorithm, which augments an on-policy PPO algorithm with a learned recurrent world model that provides improved value targets and short imagination rollouts. Extensive experiments on Llama-3.1-8B, Qwen3-8B, and Mistral-12B show that ECLD compresses base models by about 70-80% in storage (i.e., from 15.3 GB to 3.3 GB for Llama-3.1-8B) and reduces per-query energy consumption by up to 50%, while largely preserving accuracy and often lowering hallucination compared with quantization-only or pruning-only baselines. Moreover, they also show that world model-PPO speeds up convergence by about 50%, improves the final reward by 15.8% over vanilla PPO, and reduces average inference latency by 12-30% across different user populations, while satisfying the accuracy and hallucination constraints and approaching the generation quality of always-offloading with much of the efficiency of local execution.
AIAug 10, 2022Code
Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC ServicesHongyang Du, Jiazhen Liu, Dusit Niyato et al.
Metaverse encapsulates our expectations of the next-generation Internet, while bringing new key performance indicators (KPIs). Although conventional ultra-reliable and low-latency communications (URLLC) can satisfy objective KPIs, it is difficult to provide a personalized immersive experience that is a distinctive feature of the Metaverse. Since the quality of experience (QoE) can be regarded as a comprehensive KPI, the URLLC is evolved towards the next generation URLLC (xURLLC) with a personalized resource allocation scheme to achieve higher QoE. To deploy Metaverse xURLLC services, we study the interaction between the Metaverse service provider (MSP) and the network infrastructure provider (InP), and provide an optimal contract design framework. Specifically, the utility of the MSP, defined as a function of Metaverse users' QoE, is to be maximized, while ensuring the incentives of the InP. To model the QoE mathematically, we propose a novel metric named Meta-Immersion that incorporates both the objective KPIs and subjective feelings of Metaverse users. Furthermore, we develop an attention-aware rendering capacity allocation scheme to improve QoE in xURLLC. Using a user-object-attention level dataset, we validate that the xURLLC can achieve an average of 20.1% QoE improvement compared to the conventional URLLC with a uniform resource allocation scheme. The code for this paper is available at https://github.com/HongyangDu/AttentionQoE
AIJan 9, 2023
Enabling AI-Generated Content (AIGC) Services in Wireless Edge NetworksHongyang Du, Zonghang Li, Dusit Niyato et al.
Artificial Intelligence-Generated Content (AIGC) refers to the use of AI to automate the information creation process while fulfilling the personalized requirements of users. However, due to the instability of AIGC models, e.g., the stochastic nature of diffusion models, the quality and accuracy of the generated content can vary significantly. In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources. Thus, a dynamic AIGC service provider (ASP) selection scheme is required to enable users to connect to the most suited ASP, improving the users' satisfaction and quality of generated content. In this article, we first review the AIGC techniques and their applications in wireless networks. We then present the AIGC-as-a-service (AaaS) concept and discuss the challenges in deploying AaaS at the edge networks. Yet, it is essential to have performance metrics to evaluate the accuracy of AIGC services. Thus, we introduce several image-based perceived quality evaluation metrics. Then, we propose a general and effective model to illustrate the relationship between computational resources and user-perceived quality evaluation metrics. To achieve efficient AaaS and maximize the quality of generated content in wireless edge networks, we propose a deep reinforcement learning-enabled algorithm for optimal ASP selection. Simulation results show that the proposed algorithm can provide a higher quality of generated content to users and achieve fewer crashed tasks by comparing with four benchmarks, i.e., overloading-avoidance, random, round-robin policies, and the upper-bound schemes.
AIFeb 16, 2023
Generative AI-empowered Simulation for Autonomous Driving in Vehicular Mixed Reality MetaversesMinrui Xu, Dusit Niyato, Junlong Chen et al.
In the vehicular mixed reality (MR) Metaverse, the distance between physical and virtual entities can be overcome by fusing the physical and virtual environments with multi-dimensional communications in autonomous driving systems. Assisted by digital twin (DT) technologies, connected autonomous vehicles (AVs), roadside units (RSU), and virtual simulators can maintain the vehicular MR Metaverse via digital simulations for sharing data and making driving decisions collaboratively. However, large-scale traffic and driving simulation via realistic data collection and fusion from the physical world for online prediction and offline training in autonomous driving systems are difficult and costly. In this paper, we propose an autonomous driving architecture, where generative AI is leveraged to synthesize unlimited conditioned traffic and driving data in simulations for improving driving safety and traffic efficiency. First, we propose a multi-task DT offloading model for the reliable execution of heterogeneous DT tasks with different requirements at RSUs. Then, based on the preferences of AV's DTs and collected realistic data, virtual simulators can synthesize unlimited conditioned driving and traffic datasets to further improve robustness. Finally, we propose a multi-task enhanced auction-based mechanism to provide fine-grained incentives for RSUs in providing resources for autonomous driving. The property analysis and experimental results demonstrate that the proposed mechanism and architecture are strategy-proof and effective, respectively.
NIMar 30, 2023
Deep Generative Model and Its Applications in Efficient Wireless Network Management: A Tutorial and Case StudyYinqiu Liu, Hongyang Du, Dusit Niyato et al.
With the phenomenal success of diffusion models and ChatGPT, deep generation models (DGMs) have been experiencing explosive growth from 2022. Not limited to content generation, DGMs are also widely adopted in Internet of Things, Metaverse, and digital twin, due to their outstanding ability to represent complex patterns and generate plausible samples. In this article, we explore the applications of DGMs in a crucial task, i.e., improving the efficiency of wireless network management. Specifically, we firstly overview the generative AI, as well as three representative DGMs. Then, a DGM-empowered framework for wireless network management is proposed, in which we elaborate the issues of the conventional network management approaches, why DGMs can address them efficiently, and the step-by-step workflow for applying DGMs in managing wireless networks. Moreover, we conduct a case study on network economics, using the state-of-the-art DGM model, i.e., diffusion model, to generate effective contracts for incentivizing the mobile AI-Generated Content (AIGC) services. Last but not least, we discuss important open directions for the further research.
IVSep 5, 2023
Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal PromptsHongyang Du, Guangyuan Liu, Dusit Niyato et al.
Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities of GAI models indicate that semantic decoders can reconstruct source messages using a limited amount of semantic information, e.g., prompts, without joint training with the semantic encoder. A notable challenge, however, is the instability introduced by GAI's diverse generation ability. This instability, evident in outputs like text-generated images, limits the direct application of GAI in scenarios demanding accurate message recovery, such as face image transmission. To solve the above problems, this paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. Moreover, in response to security concerns, we introduce the application of covert communications aided by a friendly jammer. The system jointly optimizes the diffusion step, jamming, and transmitting power with the aid of the generative diffusion models, enabling successful and secure transmission of the source messages.
AINov 29, 2022
When Quantum Information Technologies Meet Blockchain in Web 3.0Minrui Xu, Xiaoxu Ren, Dusit Niyato et al.
With the drive to create a decentralized digital economy, Web 3.0 has become a cornerstone of digital transformation, developed on the basis of computing-force networking, distributed data storage, and blockchain. With the rapid realization of quantum devices, Web 3.0 is being developed in parallel with the deployment of quantum cloud computing and quantum Internet. In this regard, quantum computing first disrupts the original cryptographic systems that protect data security while reshaping modern cryptography with the advantages of quantum computing and communication. Therefore, in this paper, we introduce a quantum blockchain-driven Web 3.0 framework that provides information-theoretic security for decentralized data transferring and payment transactions. First, we present the framework of quantum blockchain-driven Web 3.0 with future-proof security during the transmission of data and transaction information. Next, we discuss the potential applications and challenges of implementing quantum blockchain in Web 3.0. Finally, we describe a use case for quantum non-fungible tokens (NFTs) and propose a quantum deep learning-based optimal auction for NFT trading to maximize the achievable revenue for sufficient liquidity in Web 3.0. In this way, the proposed framework can achieve proven security and sustainability for the next-generation decentralized digital society.
GTJul 29, 2023
Blockchain-empowered Federated Learning for Healthcare Metaverses: User-centric Incentive Mechanism with Optimal Data FreshnessJiawen Kang, Jinbo Wen, Dongdong Ye et al.
Given the revolutionary role of metaverses, healthcare metaverses are emerging as a transformative force, creating intelligent healthcare systems that offer immersive and personalized services. The healthcare metaverses allow for effective decision-making and data analytics for users. However, there still exist critical challenges in building healthcare metaverses, such as the risk of sensitive data leakage and issues with sensing data security and freshness, as well as concerns around incentivizing data sharing. In this paper, we first design a user-centric privacy-preserving framework based on decentralized Federated Learning (FL) for healthcare metaverses. To further improve the privacy protection of healthcare metaverses, a cross-chain empowered FL framework is utilized to enhance sensing data security. This framework utilizes a hierarchical cross-chain architecture with a main chain and multiple subchains to perform decentralized, privacy-preserving, and secure data training in both virtual and physical spaces. Moreover, we utilize Age of Information (AoI) as an effective data-freshness metric and propose an AoI-based contract theory model under Prospect Theory (PT) to motivate sensing data sharing in a user-centric manner. This model exploits PT to better capture the subjective utility of the service provider. Finally, our numerical results demonstrate the effectiveness of the proposed schemes for healthcare metaverses.
NIAug 9, 2023
Semantic Communications for Artificial Intelligence Generated Content (AIGC) Toward Effective Content CreationGuangyuan Liu, Hongyang Du, Dusit Niyato et al.
Artificial Intelligence Generated Content (AIGC) Services have significant potential in digital content creation. The distinctive abilities of AIGC, such as content generation based on minimal input, hold huge potential, especially when integrating with semantic communication (SemCom). In this paper, a novel comprehensive conceptual model for the integration of AIGC and SemCom is developed. Particularly, a content generation level is introduced on top of the semantic level that provides a clear outline of how AIGC and SemCom interact with each other to produce meaningful and effective content. Moreover, a novel framework that employs AIGC technology is proposed as an encoder and decoder for semantic information, considering the joint optimization of semantic extraction and evaluation metrics tailored to AIGC services. The framework can adapt to different types of content generated, the required quality, and the semantic information utilized. By employing a Deep Q Network (DQN), a case study is presented that provides useful insights into the feasibility of the optimization problem and its convergence characteristics.
NIJul 31, 2022
Exploring Attention-Aware Network Resource Allocation for Customized Metaverse ServicesHongyang Du, Jiacheng Wang, Dusit Niyato et al.
Emerging with the support of computing and communications technologies, Metaverse is expected to bring users unprecedented service experiences. However, the increase in the number of Metaverse users places a heavy demand on network resources, especially for Metaverse services that are based on graphical extended reality and require rendering a plethora of virtual objects. To make efficient use of network resources and improve the Quality-of-Experience (QoE), we design an attention-aware network resource allocation scheme to achieve customized Metaverse services. The aim is to allocate more network resources to virtual objects in which users are more interested. We first discuss several key techniques related to Metaverse services, including QoE analysis, eye-tracking, and remote rendering. We then review existing datasets and propose the user-object-attention level (UOAL) dataset that contains the ground truth attention of 30 users to 96 objects in 1,000 images. A tutorial on how to use UOAL is presented. With the help of UOAL, we propose an attention-aware network resource allocation algorithm that has two steps, i.e., attention prediction and QoE maximization. Specially, we provide an overview of the designs of two types of attention prediction methods, i.e., interest-aware and time-aware prediction. By using the predicted user-object-attention values, network resources such as the rendering capacity of edge devices can be allocated optimally to maximize the QoE. Finally, we propose promising research directions related to Metaverse services.
AIJan 18, 2023
Generative AI-empowered Effective Physical-Virtual Synchronization in the Vehicular MetaverseMinrui Xu, Dusit Niyato, Hongliang Zhang et al.
Metaverse seamlessly blends the physical world and virtual space via ubiquitous communication and computing infrastructure. In transportation systems, the vehicular Metaverse can provide a fully-immersive and hyperreal traveling experience (e.g., via augmented reality head-up displays, AR-HUDs) to drivers and users in autonomous vehicles (AVs) via roadside units (RSUs). However, provisioning real-time and immersive services necessitates effective physical-virtual synchronization between physical and virtual entities, i.e., AVs and Metaverse AR recommenders (MARs). In this paper, we propose a generative AI-empowered physical-virtual synchronization framework for the vehicular Metaverse. In physical-to-virtual synchronization, digital twin (DT) tasks generated by AVs are offloaded for execution in RSU with future route generation. In virtual-to-physical synchronization, MARs customize diverse and personal AR recommendations via generative AI models based on user preferences. Furthermore, we propose a multi-task enhanced auction-based mechanism to match and price AVs and MARs for RSUs to provision real-time and effective services. Finally, property analysis and experimental results demonstrate that the proposed mechanism is strategy-proof and adverse-selection free while increasing social surplus by 50%.
AIJun 26, 2023
Multi-Agent Deep Reinforcement Learning for Dynamic Avatar Migration in AIoT-enabled Vehicular Metaverses with Trajectory PredictionJunlong Chen, Jiawen Kang, Minrui Xu et al.
Avatars, as promising digital assistants in Vehicular Metaverses, can enable drivers and passengers to immerse in 3D virtual spaces, serving as a practical emerging example of Artificial Intelligence of Things (AIoT) in intelligent vehicular environments. The immersive experience is achieved through seamless human-avatar interaction, e.g., augmented reality navigation, which requires intensive resources that are inefficient and impractical to process on intelligent vehicles locally. Fortunately, offloading avatar tasks to RoadSide Units (RSUs) or cloud servers for remote execution can effectively reduce resource consumption. However, the high mobility of vehicles, the dynamic workload of RSUs, and the heterogeneity of RSUs pose novel challenges to making avatar migration decisions. To address these challenges, in this paper, we propose a dynamic migration framework for avatar tasks based on real-time trajectory prediction and Multi-Agent Deep Reinforcement Learning (MADRL). Specifically, we propose a model to predict the future trajectories of intelligent vehicles based on their historical data, indicating the future workloads of RSUs.Based on the expected workloads of RSUs, we formulate the avatar task migration problem as a long-term mixed integer programming problem. To tackle this problem efficiently, the problem is transformed into a Partially Observable Markov Decision Process (POMDP) and solved by multiple DRL agents with hybrid continuous and discrete actions in decentralized. Numerical results demonstrate that our proposed algorithm can effectively reduce the latency of executing avatar tasks by around 25% without prediction and 30% with prediction and enhance user immersive experiences in the AIoT-enabled Vehicular Metaverse (AeVeM).
AIMar 26, 2023
Guiding AI-Generated Digital Content with Wireless PerceptionJiacheng Wang, Hongyang Du, Dusit Niyato et al.
Recent advances in artificial intelligence (AI), coupled with a surge in training data, have led to the widespread use of AI for digital content generation, with ChatGPT serving as a representative example. Despite the increased efficiency and diversity, the inherent instability of AI models poses a persistent challenge in guiding these models to produce the desired content for users. In this paper, we introduce an integration of wireless perception (WP) with AI-generated content (AIGC) and propose a unified WP-AIGC framework to improve the quality of digital content production. The framework employs a novel multi-scale perception technology to read user's posture, which is difficult to describe accurately in words, and transmits it to the AIGC model as skeleton images. Based on these images and user's service requirements, the AIGC model generates corresponding digital content. Since the production process imposes the user's posture as a constraint on the AIGC model, it makes the generated content more aligned with the user's requirements. Additionally, WP-AIGC can also accept user's feedback, allowing adjustment of computing resources at edge server to improve service quality. Experiments results verify the effectiveness of the WP-AIGC framework, highlighting its potential as a novel approach for guiding AI models in the accurate generation of digital content.
AIAug 17, 2023
Artificial Intelligence for Web 3.0: A Comprehensive SurveyMeng Shen, Zhehui Tan, Dusit Niyato et al.
Web 3.0 is the new generation of the Internet that is reconstructed with distributed technology, which focuses on data ownership and value expression. Also, it operates under the principle that data and digital assets should be owned and controlled by users rather than large corporations. In this survey, we explore the current development state of Web 3.0 and the application of AI Technology in Web 3.0. Through investigating the existing applications and components of Web 3.0, we propose an architectural framework for Web 3.0 from the perspective of ecological application scenarios. We outline and divide the ecology of Web 3.0 into four layers. The main functions of each layer are data management, value circulation, ecological governance, and application scenarios. Our investigation delves into the major challenges and issues present in each of these layers. In this context, AI has shown its strong potential to solve existing problems of Web 3.0. We illustrate the crucial role of AI in the foundation and growth of Web 3.0. We begin by providing an overview of AI, including machine learning algorithms and deep learning techniques. Then, we thoroughly analyze the current state of AI technology applications in the four layers of Web 3.0 and offer some insights into its potential future development direction.
CRJun 6, 2023
Adversarial Attacks and Defenses for Semantic Communication in Vehicular MetaversesJiawen Kang, Jiayi He, Hongyang Du et al.
For vehicular metaverses, one of the ultimate user-centric goals is to optimize the immersive experience and Quality of Service (QoS) for users on board. Semantic Communication (SemCom) has been introduced as a revolutionary paradigm that significantly eases communication resource pressure for vehicular metaverse applications to achieve this goal. SemCom enables high-quality and ultra-efficient vehicular communication, even with explosively increasing data traffic among vehicles. In this article, we propose a hierarchical SemCom-enabled vehicular metaverses framework consisting of the global metaverse, local metaverses, SemCom module, and resource pool. The global and local metaverses are brand-new concepts from the metaverse's distribution standpoint. Considering the QoS of users, this article explores the potential security vulnerabilities of the proposed framework. To that purpose, this study highlights a specific security risk to the framework's SemCom module and offers a viable defense solution, so encouraging community researchers to focus more on vehicular metaverse security. Finally, we provide an overview of the open issues of secure SemCom in the vehicular metaverses, notably pointing out potential future research directions.
NISep 24, 2024
Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G NetworksJiayi He, Xiaofeng Luo, Jiawen Kang et al.
Semantic Communication (SemCom) plays a pivotal role in 6G networks, offering a viable solution for future efficient communication. Deep Learning (DL)-based semantic codecs further enhance this efficiency. However, the vulnerability of DL models to security threats, such as adversarial attacks, poses significant challenges for practical applications of SemCom systems. These vulnerabilities enable attackers to tamper with messages and eavesdrop on private information, especially in wireless communication scenarios. Although existing defenses attempt to address specific threats, they often fail to simultaneously handle multiple heterogeneous attacks. To overcome this limitation, we introduce a novel Mixture-of-Experts (MoE)-based SemCom system. This system comprises a gating network and multiple experts, each specializing in different security challenges. The gating network adaptively selects suitable experts to counter heterogeneous attacks based on user-defined security requirements. Multiple experts collaborate to accomplish semantic communication tasks while meeting the security requirements of users. A case study in vehicular networks demonstrates the efficacy of the MoE-based SemCom system. Simulation results show that the proposed MoE-based SemCom system effectively mitigates concurrent heterogeneous attacks, with minimal impact on downstream task accuracy.
NIAug 2, 2022
Economics of Semantic Communication System: An Auction ApproachZi Qin Liew, Hongyang Du, Wei Yang Bryan Lim et al.
Semantic communication technologies enable wireless edge devices to communicate effectively by transmitting semantic meaning of data. Edge components, such as vehicles in next-generation intelligent transport systems, use well-trained semantic models to encode and decode semantic information extracted from raw and sensor data. However, the limitation in computing resources makes it difficult to support the training process of accurate semantic models on edge devices. As such, edge devices can buy the pretrained semantic models from semantic model providers, which is called "semantic model trading". Upon collecting semantic information with the semantic models, the edge devices can then sell the extracted semantic information, e.g., information about urban road conditions or traffic signs, to the interested buyers for profit, which is called "semantic information trading". To facilitate both types of the trades, effective incentive mechanisms should be designed. Thus, in this paper, we propose a hierarchical trading system to support both semantic model trading and semantic information trading jointly. The proposed incentive mechanism helps to maximize the revenue of semantic model providers in the semantic model trading, and effectively incentivizes model providers to participate in the development of semantic communication systems. For semantic information trading, our designed auction approach can support the trading between multiple semantic information sellers and buyers, while ensuring individual rationality, incentive compatibility, and budget balance, and moreover, allowing them achieve higher utilities than the baseline method.
SYNov 1, 2022
Multi-Resource Allocation for On-Device Distributed Federated Learning SystemsYulan Gao, Ziqiang Ye, Han Yu et al.
This work poses a distributed multi-resource allocation scheme for minimizing the weighted sum of latency and energy consumption in the on-device distributed federated learning (FL) system. Each mobile device in the system engages the model training process within the specified area and allocates its computation and communication resources for deriving and uploading parameters, respectively, to minimize the objective of system subject to the computation/communication budget and a target latency requirement. In particular, mobile devices are connect via wireless TCP/IP architectures. Exploiting the optimization problem structure, the problem can be decomposed to two convex sub-problems. Drawing on the Lagrangian dual and harmony search techniques, we characterize the global optimal solution by the closed-form solutions to all sub-problems, which give qualitative insights to multi-resource tradeoff. Numerical simulations are used to validate the analysis and assess the performance of the proposed algorithm.
GTAug 15, 2023
Vision-based Semantic Communications for Metaverse Services: A Contest Theoretic ApproachGuangyuan Liu, Hongyang Du, Dusit Niyato et al.
The popularity of Metaverse as an entertainment, social, and work platform has led to a great need for seamless avatar integration in the virtual world. In Metaverse, avatars must be updated and rendered to reflect users' behaviour. Achieving real-time synchronization between the virtual bilocation and the user is complex, placing high demands on the Metaverse Service Provider (MSP)'s rendering resource allocation scheme. To tackle this issue, we propose a semantic communication framework that leverages contest theory to model the interactions between users and MSPs and determine optimal resource allocation for each user. To reduce the consumption of network resources in wireless transmission, we use the semantic communication technique to reduce the amount of data to be transmitted. Under our simulation settings, the encoded semantic data only contains 51 bytes of skeleton coordinates instead of the image size of 8.243 megabytes. Moreover, we implement Deep Q-Network to optimize reward settings for maximum performance and efficient resource allocation. With the optimal reward setting, users are incentivized to select their respective suitable uploading frequency, reducing down-sampling loss due to rendering resource constraints by 66.076\% compared with the traditional average distribution method. The framework provides a novel solution to resource allocation for avatar association in VR environments, ensuring a smooth and immersive experience for all users.
CVApr 19, 2023
DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent TransportationYu Guo, Ryan Wen Liu, Jiangtian Nie et al.
Visual surveillance technology is an indispensable functional component of advanced traffic management systems. It has been applied to perform traffic supervision tasks, such as object detection, tracking and recognition. However, adverse weather conditions, e.g., fog, haze and mist, pose severe challenges for video-based transportation surveillance. To eliminate the influences of adverse weather conditions, we propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement. It consists of a dual attention module (DAM) and a high-low frequency-guided sub-net (HLFN) to jointly consider the attention and frequency mapping to guide haze-free scene reconstruction. Extensive experiments on both synthetic and real-world images demonstrate the superiority of DADFNet over state-of-the-art methods in terms of visibility enhancement and improvement in detection accuracy. Furthermore, DADFNet only takes $6.3$ ms to process a 1,920 * 1,080 image on the 2080 Ti GPU, making it highly efficient for deployment in intelligent transportation systems.
DCNov 4, 2025
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge NetworksXiumei Deng, Zehui Xiong, Binbin Chen et al.
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios. However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks. To address these, we propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism, creating a new distributed LLM inference framework that simultaneously achieves privacy protection, communication efficiency, and computational efficiency. FedAttn enables participants to perform local self-attention over their own token representations while periodically exchanging and aggregating Key-Value (KV) matrices across multiple Transformer blocks, collaboratively generating LLM responses without exposing private prompts. Further, we identify a structural duality between contextual representation refinement in FedAttn and parameter optimization in FL across private data, local computation, and global aggregation. This key insight provides a principled foundation for systematically porting federated optimization techniques to collaborative LLM inference. Building on this framework, we theoretically analyze how local self-attention computation within participants and heterogeneous token relevance among participants shape error propagation dynamics across Transformer blocks. Moreover, we characterize the fundamental trade-off between response quality and communication/computation efficiency, which is governed by the synchronization interval and the number of participants. Experimental results validate our theoretical analysis, and reveal significant optimization opportunities through sparse attention and adaptive KV aggregation, highlighting FedAttn's potential to deliver scalability and efficiency in real-world edge deployments.
NIJan 15
Large Language Model (LLM)-enabled Reinforcement Learning for Wireless Network OptimizationJie Zheng, Ruichen Zhang, Dusit Niyato et al.
Enhancing future wireless networks presents a significant challenge for networking systems due to diverse user demands and the emergence of 6G technology. While reinforcement learning (RL) is a powerful framework, it often encounters difficulties with high-dimensional state spaces and complex environments, leading to substantial computational demands, distributed intelligence, and potentially inconsistent outcomes. Large language models (LLMs), with their extensive pretrained knowledge and advanced reasoning capabilities, offer promising tools to enhance RL in optimizing 6G wireless networks. We explore RL models augmented by LLMs, emphasizing their roles and the potential benefits of their synergy in wireless network optimization. We then examine LLM-enabled RL across various protocol layers: physical, data link, network, transport, and application layers. Additionally, we propose an LLM-assisted state representation and semantic extraction to enhance the multi-agent reinforcement learning (MARL) framework. This approach is applied to service migration and request routing, as well as topology graph generation in unmanned aerial vehicle (UAV)-satellite networks. Through case studies, we demonstrate that our framework effectively performs optimization of wireless network. Finally, we outline prospective research directions for LLM-enabled RL in wireless network optimization.
CVNov 3, 2025
SecDiff: Diffusion-Aided Secure Deep Joint Source-Channel Coding Against Adversarial AttacksChangyuan Zhao, Jiacheng Wang, Ruichen Zhang et al.
Deep joint source-channel coding (JSCC) has emerged as a promising paradigm for semantic communication, delivering significant performance gains over conventional separate coding schemes. However, existing JSCC frameworks remain vulnerable to physical-layer adversarial threats, such as pilot spoofing and subcarrier jamming, compromising semantic fidelity. In this paper, we propose SecDiff, a plug-and-play, diffusion-aided decoding framework that significantly enhances the security and robustness of deep JSCC under adversarial wireless environments. Different from prior diffusion-guided JSCC methods that suffer from high inference latency, SecDiff employs pseudoinverse-guided sampling and adaptive guidance weighting, enabling flexible step-size control and efficient semantic reconstruction. To counter jamming attacks, we introduce a power-based subcarrier masking strategy and recast recovery as a masked inpainting problem, solved via diffusion guidance. For pilot spoofing, we formulate channel estimation as a blind inverse problem and develop an expectation-minimization (EM)-driven reconstruction algorithm, guided jointly by reconstruction loss and a channel operator. Notably, our method alternates between pilot recovery and channel estimation, enabling joint refinement of both variables throughout the diffusion process. Extensive experiments over orthogonal frequency-division multiplexing (OFDM) channels under adversarial conditions show that SecDiff outperforms existing secure and generative JSCC baselines by achieving a favorable trade-off between reconstruction quality and computational cost. This balance makes SecDiff a promising step toward practical, low-latency, and attack-resilient semantic communications.
NIApr 10
Generative AI Agent Empowered Power Allocation for HAP Propulsion and Communication SystemsXiaoyu Xing, Dingyi Lu, Peng Yang et al.
High altitude platforms (HAPs) are emerging as a key enabler for 6G coverage, yet limited energy must be split between propulsion and communications. Most prior HAP studies ignore propulsion power or rely on surrogates that miss hull-propeller interference, leading to misestimated communication power budgets and degraded beamforming. More importantly, HAP power allocation is intrinsically a multi-system, multidisciplinary problem in which aerodynamics, propulsion-system efficiency, and communication-system performance (quality of service (QoS) and energy efficiency (EE)) are tightly coupled.To address these challenges, this paper designs an interactive generative artificial intelligence (AI)-empowered HAP power allocation agent.By interacting with the AI agent, we develop an accurate propulsion power consumption model that takes into account the aerodynamic interference between the HAP's hull and the propeller. Assisted by the AI agent, we further formulate a HAP beamforming problem to improve user QoS and enhance the EE of the HAP communication system.This paper also proposes a QoS-enhanced energy-efficient (Q3E) beamforming algorithm to solve the formulated problem.Simulation results demonstrate the accuracy of the propulsion-power model and the effectiveness of the Q3E algorithm.
NIApr 10
Multimodal Large Language Model Enabled Robust Beamforming for HAP Downlink CommunicationsXiaoyu Xing, Peng Yang, Guoquan Tao et al.
Small changes in high altitude platform (HAP) attitude can cause significant deviations in HAP downlink beam directions, thereby severely degrading HAP downlink communication performance. In this paper, we develop a multimodal large language model (LLM) enabled beamforming framework to achieve robust HAP downlink communications.Specifically, we design a vision-language LLM (VL-LLM) that learns from multivariate flight telemetry to forecast short-term HAP attitudes under platform shaking and support delay-aware proactive beam steering.We design an offline forecast-error calibration procedure to obtain upper bounds on forecast errors and improve the reliability of proactive analog beam steering.Based on the attitude forecasts, we proactively update the analog beamformer and propose a QoS-driven beamforming and admission method with a lightweight feasibility-enforcement step to satisfy instantaneous transmit-power and QoS requirements.Simulation results indicate that the designed VL-LLM can accurately capture changes in the HAP attitude and the proposed beamforming method achieves a 22.1% higher user service ratio and a 12.5% higher sum-rate than representative baselines.The measured mean and p99 total latencies are 36.24 ms and 40.13 ms, respectively, supporting practical delay-aware deployment.
NIMar 22
DRL-driven Online Optimization for Joint Traffic Reshaping and Channel Reconfiguration in RIS-assisted Semantic NOMA CommunicationsSonghan Zhao, Shimin Gong, Bo Gu et al.
This paper explores a reconfigurable intelligent surface (RIS)-assisted and semantic-aware wireless network, where multiple semantic users (SUs) transmit semantic information to an access point (AP) using the non-orthogonal multiple access (NOMA) method. The RIS reconfigures channel conditions, while semantic extraction reshapes traffic demands, providing enhanced control flexibility for NOMA transmissions. To enable efficient long-term resource allocation, we propose a deferrable semantic extraction scheme that can distribute the semantic extraction tasks across multiple time slots. We formulate a long-term energy efficiency maximization problem by jointly optimizing the RIS's passive beamforming, the SUs' semantic extraction, and the NOMA decoding order. Note that this problem involves multiple and coupled control variables, which can incur significant computational overhead in time-varying network environments. To support low-complexity online optimization, a deep reinforcement learning (DRL)-driven online optimization framework is developed. Specifically, the DRL module facilitates the adaptive selection and optimization of the most suitable option from traffic reshaping, channel reconfiguration, or NOMA decoding order assignment based on the dynamic network status. Numerical results demonstrate that the deferrable semantic extraction scheme significantly improves the long-term energy efficiency. Meanwhile, the DRL-driven online optimization framework effectively reduces the running time while maintaining superior learning performance compared to state-of-the-art methods.
NIMar 22
Generative Artificial Intelligence Assisted Multi-modal Semantic Extraction for NOMA-based Image TransmissionsSonghan Zhao, Shimin Gong, Bo Gu et al.
In this paper, we investigate a generative artificial intelligence (GAI)-assisted semantic communication framework for non-orthogonal multiple access (NOMA)-based image transmissions. Semantic users (SUs) extract cross-modal semantic features from the raw images, which are then used for image recovery by leveraging a GAI model. The GAI enhances the generalization and recovery of semantic image transmissions, while NOMA efficiently allocates transmission capacities to SUs based on their traffic demands. Thus, the semantic extraction and transmission control jointly affect both semantic recovery performance and transmission overhead. We maximize a weighted performance of transmission latency and semantic recovery accuracy by jointly optimizing the semantic feature selection at the semantic level, as well as the receive beamforming and NOMA decoding order at the transmission level. To reduce potential redundancy in semantic features and improve optimization efficiency, we develop an importance-aware and model-driven proximal policy optimization (IM-PPO) framework. Specifically, we quantify and retain high-importance semantic features to enhance the learning efficiency of PPO, while model-based optimization methods are used to adapt the transmission control variables. Numerical results validate that the joint adjustment of the semantic feature selection and the transmission control significantly improves the semantic recovery accuracy and the transmission latency performance. Moreover, the IM-PPO framework effectively leverages the model information to improve the learning efficiency compared to benchmark methods.
NIApr 14, 2024
Generative AI Agents with Large Language Model for Satellite Networks via a Mixture of Experts TransmissionRuichen Zhang, Hongyang Du, Yinqiu Liu et al.
In response to the needs of 6G global communications, satellite communication networks have emerged as a key solution. However, the large-scale development of satellite communication networks is constrained by the complex system models, whose modeling is challenging for massive users. Moreover, transmission interference between satellites and users seriously affects communication performance. To solve these problems, this paper develops generative artificial intelligence (AI) agents for model formulation and then applies a mixture of experts (MoE) approach to design transmission strategies. Specifically, we leverage large language models (LLMs) to build an interactive modeling paradigm and utilize retrieval-augmented generation (RAG) to extract satellite expert knowledge that supports mathematical modeling. Afterward, by integrating the expertise of multiple specialized components, we propose an MoE-proximal policy optimization (PPO) approach to solve the formulated problem. Each expert can optimize the optimization variables at which it excels through specialized training through its own network and then aggregates them through the gating network to perform joint optimization. The simulation results validate the accuracy and effectiveness of employing a generative agent for problem formulation. Furthermore, the superiority of the proposed MoE-ppo approach over other benchmarks is confirmed in solving the formulated problem. The adaptability of MoE-PPO to various customized modeling problems has also been demonstrated.
AIJan 15, 2024
When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and AlignmentMinrui Xu, Dusit Niyato, Jiawen Kang et al.
AI agents based on multimodal large language models (LLMs) are expected to revolutionize human-computer interaction and offer more personalized assistant services across various domains like healthcare, education, manufacturing, and entertainment. Deploying LLM agents in 6G networks enables users to access previously expensive AI assistant services via mobile devices democratically, thereby reducing interaction latency and better preserving user privacy. Nevertheless, the limited capacity of mobile devices constrains the effectiveness of deploying and executing local LLMs, which necessitates offloading complex tasks to global LLMs running on edge servers during long-horizon interactions. In this article, we propose a split learning system for LLM agents in 6G networks leveraging the collaboration between mobile devices and edge servers, where multiple LLMs with different roles are distributed across mobile devices and edge servers to perform user-agent interactive tasks collaboratively. In the proposed system, LLM agents are split into perception, grounding, and alignment modules, facilitating inter-module communications to meet extended user requirements on 6G network functions, including integrated sensing and communication, digital twins, and task-oriented communications. Furthermore, we introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context, thus reducing network costs of the collaborative mobile and edge LLM agents.
ROFeb 28, 2024
Generative AI for Unmanned Vehicle Swarms: Challenges, Applications and OpportunitiesGuangyuan Liu, Nguyen Van Huynh, Hongyang Du et al.
With recent advances in artificial intelligence (AI) and robotics, unmanned vehicle swarms have received great attention from both academia and industry due to their potential to provide services that are difficult and dangerous to perform by humans. However, learning and coordinating movements and actions for a large number of unmanned vehicles in complex and dynamic environments introduce significant challenges to conventional AI methods. Generative AI (GAI), with its capabilities in complex data feature extraction, transformation, and enhancement, offers great potential in solving these challenges of unmanned vehicle swarms. For that, this paper aims to provide a comprehensive survey on applications, challenges, and opportunities of GAI in unmanned vehicle swarms. Specifically, we first present an overview of unmanned vehicles and unmanned vehicle swarms as well as their use cases and existing issues. Then, an in-depth background of various GAI techniques together with their capabilities in enhancing unmanned vehicle swarms are provided. After that, we present a comprehensive review on the applications and challenges of GAI in unmanned vehicle swarms with various insights and discussions. Finally, we highlight open issues of GAI in unmanned vehicle swarms and discuss potential research directions.
NIFeb 24, 2025
Toward Agentic AI: Generative Information Retrieval Inspired Intelligent Communications and NetworkingRuichen Zhang, Shunpu Tang, Yinqiu Liu et al.
The increasing complexity and scale of modern telecommunications networks demand intelligent automation to enhance efficiency, adaptability, and resilience. Agentic AI has emerged as a key paradigm for intelligent communications and networking, enabling AI-driven agents to perceive, reason, decide, and act within dynamic networking environments. However, effective decision-making in telecom applications, such as network planning, management, and resource allocation, requires integrating retrieval mechanisms that support multi-hop reasoning, historical cross-referencing, and compliance with evolving 3GPP standards. This article presents a forward-looking perspective on generative information retrieval-inspired intelligent communications and networking, emphasizing the role of knowledge acquisition, processing, and retrieval in agentic AI for telecom systems. We first provide a comprehensive review of generative information retrieval strategies, including traditional retrieval, hybrid retrieval, semantic retrieval, knowledge-based retrieval, and agentic contextual retrieval. We then analyze their advantages, limitations, and suitability for various networking scenarios. Next, we present a survey about their applications in communications and networking. Additionally, we introduce an agentic contextual retrieval framework to enhance telecom-specific planning by integrating multi-source retrieval, structured reasoning, and self-reflective validation. Experimental results demonstrate that our framework significantly improves answer accuracy, explanation consistency, and retrieval efficiency compared to traditional and semantic retrieval methods. Finally, we outline future research directions.
NIApr 25, 2024
Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A SurveyMinrui Xu, Dusit Niyato, Jiawen Kang et al.
Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets, completing sensor data, and making sequential decisions. In addition, the mixture of experts (MoE) can enable the distributed and collaborative execution of AI models without performance degradation between connected vehicles. In this survey, we explore the integration of MoE and GAI to enable Artificial General Intelligence in IoV, which can enable the realization of full autonomy for IoV with minimal human supervision and applicability in a wide range of mobility scenarios, including environment monitoring, traffic management, and autonomous driving. In particular, we present the fundamentals of GAI, MoE, and their interplay applications in IoV. Furthermore, we discuss the potential integration of MoE and GAI in IoV, including distributed perception and monitoring, collaborative decision-making and planning, and generative modeling and simulation. Finally, we present several potential research directions for facilitating the integration.
NIApr 10, 2024
Agent-driven Generative Semantic Communication with Cross-Modality and PredictionWanting Yang, Zehui Xiong, Yanli Yuan et al.
In the era of 6G, with compelling visions of intelligent transportation systems and digital twins, remote surveillance is poised to become a ubiquitous practice. Substantial data volume and frequent updates present challenges in wireless networks. To address these challenges, we propose a novel agent-driven generative semantic communication (A-GSC) framework based on reinforcement learning. In contrast to the existing research on semantic communication (SemCom), which mainly focuses on either semantic extraction or semantic sampling, we seamlessly integrate both by jointly considering the intrinsic attributes of source information and the contextual information regarding the task. Notably, the introduction of generative artificial intelligence (GAI) enables the independent design of semantic encoders and decoders. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. Accordingly, we design a semantic decoder with both predictive and generative capabilities, consisting of two tailored modules. Moreover, the effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework in both energy saving and reconstruction accuracy.
CVOct 24, 2024
Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive SurveyKun Shi, Shibo He, Zhenyu Shi et al.
Multi-modal fusion is imperative to the implementation of reliable object detection and tracking in complex environments. Exploiting the synergy of heterogeneous modal information endows perception systems the ability to achieve more comprehensive, robust, and accurate performance. As a nucleus concern in wireless-vision collaboration, radar-camera fusion has prompted prospective research directions owing to its extensive applicability, complementarity, and compatibility. Nonetheless, there still lacks a systematic survey specifically focusing on deep fusion of radar and camera for object detection and tracking. To fill this void, we embark on an endeavor to comprehensively review radar-camera fusion in a holistic way. First, we elaborate on the fundamental principles, methodologies, and applications of radar-camera fusion perception. Next, we delve into the key techniques concerning sensor calibration, modal representation, data alignment, and fusion operation. Furthermore, we provide a detailed taxonomy covering the research topics related to object detection and tracking in the context of radar and camera technologies.Finally, we discuss the emerging perspectives in the field of radar-camera fusion perception and highlight the potential areas for future research.
LGMay 12, 2024
Edge Intelligence Optimization for Large Language Model Inference with Batching and QuantizationXinyuan Zhang, Jiang Liu, Zehui Xiong et al.
Generative Artificial Intelligence (GAI) is taking the world by storm with its unparalleled content creation ability. Large Language Models (LLMs) are at the forefront of this movement. However, the significant resource demands of LLMs often require cloud hosting, which raises issues regarding privacy, latency, and usage limitations. Although edge intelligence has long been utilized to solve these challenges by enabling real-time AI computation on ubiquitous edge resources close to data sources, most research has focused on traditional AI models and has left a gap in addressing the unique characteristics of LLM inference, such as considerable model size, auto-regressive processes, and self-attention mechanisms. In this paper, we present an edge intelligence optimization problem tailored for LLM inference. Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs. Furthermore, our approach aims to maximize the inference throughput via batch scheduling and joint allocation of communication and computation resources, while also considering edge resource constraints and varying user requirements of latency and accuracy. To address this NP-hard problem, we develop an optimal Depth-First Tree-Searching algorithm with online tree-Pruning (DFTSP) that operates within a feasible time complexity. Simulation results indicate that DFTSP surpasses other batching benchmarks in throughput across diverse user settings and quantization techniques, and it reduces time complexity by over 45% compared to the brute-force searching method.
CRJun 5, 2025
BESA: Boosting Encoder Stealing Attack with Perturbation RecoveryXuhao Ren, Haotian Liang, Yajie Wang et al.
To boost the encoder stealing attack under the perturbation-based defense that hinders the attack performance, we propose a boosting encoder stealing attack with perturbation recovery named BESA. It aims to overcome perturbation-based defenses. The core of BESA consists of two modules: perturbation detection and perturbation recovery, which can be combined with canonical encoder stealing attacks. The perturbation detection module utilizes the feature vectors obtained from the target encoder to infer the defense mechanism employed by the service provider. Once the defense mechanism is detected, the perturbation recovery module leverages the well-designed generative model to restore a clean feature vector from the perturbed one. Through extensive evaluations based on various datasets, we demonstrate that BESA significantly enhances the surrogate encoder accuracy of existing encoder stealing attacks by up to 24.63\% when facing state-of-the-art defenses and combinations of multiple defenses.
NIMay 18, 2025
LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage ApproachYinqiu Liu, Guangyuan Liu, Jiacheng Wang et al.
Nowadays, Generative AI (GenAI) reshapes numerous domains by enabling machines to create content across modalities. As GenAI evolves into autonomous agents capable of reasoning, collaboration, and interaction, they are increasingly deployed on network infrastructures to serve humans automatically. This emerging paradigm, known as the agentic network, presents new optimization challenges due to the demand to incorporate subjective intents of human users expressed in natural language. Traditional generic Deep Reinforcement Learning (DRL) struggles to capture intent semantics and adjust policies dynamically, thus leading to suboptimality. In this paper, we present LAMeTA, a Large AI Model (LAM)-empowered Two-stage Approach for intent-aware agentic network optimization. First, we propose Intent-oriented Knowledge Distillation (IoKD), which efficiently distills intent-understanding capabilities from resource-intensive LAMs to lightweight edge LAMs (E-LAMs) to serve end users. Second, we develop Symbiotic Reinforcement Learning (SRL), integrating E-LAMs with a policy-based DRL framework. In SRL, E-LAMs translate natural language user intents into structured preference vectors that guide both state representation and reward design. The DRL, in turn, optimizes the generative service function chain composition and E-LAM selection based on real-time network conditions, thus optimizing the subjective Quality-of-Experience (QoE). Extensive experiments conducted in an agentic network with 81 agents demonstrate that IoKD reduces mean squared error in intent prediction by up to 22.5%, while SRL outperforms conventional generic DRL by up to 23.5% in maximizing intent-aware QoE.
HCOct 14, 2024
Tracing Human Stress from Physiological Signals using UWB RadarJia Xu, Teng Xiao, Pin Lv et al.
Stress tracing is an important research domain that supports many applications, such as health care and stress management; and its closest related works are derived from stress detection. However, these existing works cannot well address two important challenges facing stress detection. First, most of these studies involve asking users to wear physiological sensors to detect their stress states, which has a negative impact on the user experience. Second, these studies have failed to effectively utilize multimodal physiological signals, which results in less satisfactory detection results. This paper formally defines the stress tracing problem, which emphasizes the continuous detection of human stress states. A novel deep stress tracing method, named DST, is presented. Note that DST proposes tracing human stress based on physiological signals collected by a noncontact ultrawideband radar, which is more friendly to users when collecting their physiological signals. In DST, a signal extraction module is carefully designed at first to robustly extract multimodal physiological signals from the raw RF data of the radar, even in the presence of body movement. Afterward, a multimodal fusion module is proposed in DST to ensure that the extracted multimodal physiological signals can be effectively fused and utilized. Extensive experiments are conducted on three real-world datasets, including one self-collected dataset and two publicity datasets. Experimental results show that the proposed DST method significantly outperforms all the baselines in terms of tracing human stress states. On average, DST averagely provides a 6.31% increase in detection accuracy on all datasets, compared with the best baselines.
SYSep 24, 2025
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge NetworksJiewei Chen, Xiumei Deng, Zehui Xiong et al.
The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments remains challenging due to heavy computation, high end-to-end latency, and limited model generalization. We introduce CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving intelligent networks. In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks. Then we perform global model update via federated aggregation. To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power. We derive and use a closed-form convergence bound to design an Dynamic Segment Scheduling and Resource Allocation (DSSDA) algorithm based on Lyapunov optimization, ensuring system stability under long-term constraints. Extensive experiments on downstream tasks with Transformer and BERT models show that CollaPipe improves computation efficiency by up to 15.09%, reduces end-to-end latency by at least 48.98%, and cuts single device memory usage by more than half, enabling online learning in heterogeneous and dynamic communication environments.
LGAug 6, 2025
Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC)Nan Li, Wanting Yang, Marie Siew et al.
Diffusion models (DMs) have emerged as powerful tools for high-quality content generation, yet their intensive computational requirements for inference pose challenges for resource-constrained edge devices. Cloud-based solutions aid in computation but often fall short in addressing privacy risks, personalization efficiency, and communication costs in multi-user edge-AIGC scenarios. To bridge this gap, we first analyze existing edge-AIGC applications in personalized content synthesis, revealing their limitations in efficiency and scalability. We then propose a novel cluster-aware hierarchical federated aggregation framework. Based on parameter-efficient local fine-tuning via Low-Rank Adaptation (LoRA), the framework first clusters clients based on the similarity of their uploaded task requirements, followed by an intra-cluster aggregation for enhanced personalization at the server-side. Subsequently, an inter-cluster knowledge interaction paradigm is implemented to enable hybrid-style content generation across diverse clusters.Building upon federated learning (FL) collaboration, our framework simultaneously trains personalized models for individual users at the devices and a shared global model enhanced with multiple LoRA adapters on the server,enabling efficient edge inference; meanwhile, all prompts for clustering and inference are encoded prior to transmission, thereby further mitigating the risk of plaintext leakage. Our evaluations demonstrate that the framework achieves accelerated convergence while maintaining practical viability for scalable multi-user personalized AIGC services under edge constraints.
CRAug 1, 2025
CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy OptimizationYuning Jiang, Nay Oo, Qiaoran Meng et al.
Modern cyber attacks unfold through multiple stages, requiring defenders to dynamically prioritize mitigations under uncertainty. While game-theoretic models capture attacker-defender interactions, existing approaches often rely on static assumptions and lack integration with real-time threat intelligence, limiting their adaptability. This paper presents CyGATE, a game-theoretic framework modeling attacker-defender interactions, using large language models (LLMs) with retrieval-augmented generation (RAG) to enhance tactic selection and patch prioritization. Applied to a two-agent scenario, CyGATE frames cyber conflicts as a partially observable stochastic game (POSG) across Cyber Kill Chain stages. Both agents use belief states to navigate uncertainty, with the attacker adapting tactics and the defender re-prioritizing patches based on evolving risks and observed adversary behavior. The framework's flexible architecture enables extension to multi-agent scenarios involving coordinated attackers, collaborative defenders, or complex enterprise environments with multiple stakeholders. Evaluated in a dynamic patch scheduling scenario, CyGATE effectively prioritizes high-risk vulnerabilities, enhancing adaptability through dynamic threat integration, strategic foresight by anticipating attacker moves under uncertainty, and efficiency by optimizing resource use.
CVJun 3, 2025
Channel-adaptive Cross-modal Generative Semantic Communication for Point Cloud TransmissionWanting Yang, Zehui Xiong, Qianqian Yang et al.
With the rapid development of autonomous driving and extended reality, efficient transmission of point clouds (PCs) has become increasingly important. In this context, we propose a novel channel-adaptive cross-modal generative semantic communication (SemCom) for PC transmission, called GenSeC-PC. GenSeC-PC employs a semantic encoder that fuses images and point clouds, where images serve as non-transmitted side information. Meanwhile, the decoder is built upon the backbone of PointDif. Such a cross-modal design not only ensures high compression efficiency but also delivers superior reconstruction performance compared to PointDif. Moreover, to ensure robust transmission and reduce system complexity, we design a streamlined and asymmetric channel-adaptive joint semantic-channel coding architecture, where only the encoder needs the feedback of average signal-to-noise ratio (SNR) and available bandwidth. In addition, rectified denoising diffusion implicit models is employed to accelerate the decoding process to the millisecond level, enabling real-time PC communication. Unlike existing methods, GenSeC-PC leverages generative priors to ensure reliable reconstruction even from noisy or incomplete source PCs. More importantly, it supports fully analog transmission, improving compression efficiency by eliminating the need for error-free side information transmission common in prior SemCom approaches. Simulation results confirm the effectiveness of cross-modal semantic extraction and dual-metric guided fine-tuning, highlighting the framework's robustness across diverse conditions, including low SNR, bandwidth limitations, varying numbers of 2D images, and previously unseen objects.
LGOct 25, 2024
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse ServicesHongjia Wu, Hui Zeng, Zehui Xiong et al.
Timely updating of Internet of Things data is crucial for achieving immersion in vehicular metaverse services. However, challenges such as latency caused by massive data transmissions, privacy risks associated with user data, and computational burdens on metaverse service providers (MSPs) hinder the continuous collection of high-quality data. To address these challenges, we propose an immersion-aware model trading framework that enables efficient and privacy-preserving data provisioning through federated learning (FL). Specifically, we first develop a novel multi-dimensional evaluation metric for the immersion of models (IoM). The metric considers the freshness and accuracy of the local model, and the amount and potential value of raw training data. Building on the IoM, we design an incentive mechanism to encourage metaverse users (MUs) to participate in FL by providing local updates to MSPs under resource constraints. The trading interactions between MSPs and MUs are modeled as an equilibrium problem with equilibrium constraints (EPEC) to analyze and balance their costs and gains, where MSPs as leaders determine rewards, while MUs as followers optimize resource allocation. To ensure privacy and adapt to dynamic network conditions, we develop a distributed dynamic reward algorithm based on deep reinforcement learning, without acquiring any private information from MUs and other MSPs. Experimental results show that the proposed framework outperforms state-of-the-art benchmarks, achieving improvements in IoM of 38.3% and 37.2%, and reductions in training time to reach the target accuracy of 43.5% and 49.8%, on average, for the MNIST and GTSRB datasets, respectively. These findings validate the effectiveness of our approach in incentivizing MUs to contribute high-value local models to MSPs, providing a flexible and adaptive scheme for data provisioning in vehicular metaverse services.
AIJun 8, 2024
Multi-attribute Auction-based Resource Allocation for Twins Migration in Vehicular Metaverses: A GPT-based DRL ApproachYongju Tong, Junlong Chen, Minrui Xu et al.
Vehicular Metaverses are developed to enhance the modern automotive industry with an immersive and safe experience among connected vehicles and roadside infrastructures, e.g., RoadSide Units (RSUs). For seamless synchronization with virtual spaces, Vehicle Twins (VTs) are constructed as digital representations of physical entities. However, resource-intensive VTs updating and high mobility of vehicles require intensive computation, communication, and storage resources, especially for their migration among RSUs with limited coverages. To address these issues, we propose an attribute-aware auction-based mechanism to optimize resource allocation during VTs migration by considering both price and non-monetary attributes, e.g., location and reputation. In this mechanism, we propose a two-stage matching for vehicular users and Metaverse service providers in multi-attribute resource markets. First, the resource attributes matching algorithm obtains the resource attributes perfect matching, namely, buyers and sellers can participate in a double Dutch auction (DDA). Then, we train a DDA auctioneer using a generative pre-trained transformer (GPT)-based deep reinforcement learning (DRL) algorithm to adjust the auction clocks efficiently during the auction process. We compare the performance of social welfare and auction information exchange costs with state-of-the-art baselines under different settings. Simulation results show that our proposed GPT-based DRL auction schemes have better performance than others.
LGJan 19, 2024
Deep Reinforcement Learning Empowered Activity-Aware Dynamic Health Monitoring SystemsZiqiaing Ye, Yulan Gao, Yue Xiao et al.
In smart healthcare, health monitoring utilizes diverse tools and technologies to analyze patients' real-time biosignal data, enabling immediate actions and interventions. Existing monitoring approaches were designed on the premise that medical devices track several health metrics concurrently, tailored to their designated functional scope. This means that they report all relevant health values within that scope, which can result in excess resource use and the gathering of extraneous data due to monitoring irrelevant health metrics. In this context, we propose Dynamic Activity-Aware Health Monitoring strategy (DActAHM) for striking a balance between optimal monitoring performance and cost efficiency, a novel framework based on Deep Reinforcement Learning (DRL) and SlowFast Model to ensure precise monitoring based on users' activities. Specifically, with the SlowFast Model, DActAHM efficiently identifies individual activities and captures these results for enhanced processing. Subsequently, DActAHM refines health metric monitoring in response to the identified activity by incorporating a DRL framework. Extensive experiments comparing DActAHM against three state-of-the-art approaches demonstrate it achieves 27.3% higher gain than the best-performing baseline that fixes monitoring actions over timeline.
CRFeb 18, 2022
Incentive Mechanism Design for Joint Resource Allocation in Blockchain-based Federated LearningZhilin Wang, Qin Hu, Ruinian Li et al.
Blockchain-based federated learning (BCFL) has recently gained tremendous attention because of its advantages such as decentralization and privacy protection of raw data. However, there has been few research focusing on the allocation of resources for clients in BCFL. In the BCFL framework where the FL clients and the blockchain miners are the same devices, clients broadcast the trained model updates to the blockchain network and then perform mining to generate new blocks. Since each client has a limited amount of computing resources, the problem of allocating computing resources into training and mining needs to be carefully addressed. In this paper, we design an incentive mechanism to assign each client appropriate rewards for training and mining, and then the client will determine the amount of computing power to allocate for each subtask based on these rewards using the two-stage Stackelberg game. After analyzing the utilities of the model owner (MO) (i.e., the BCFL task publisher) and clients, we transform the game model into two optimization problems, which are sequentially solved to derive the optimal strategies for both the MO and clients. Further, considering the fact that local training related information of each client may not be known by others, we extend the game model with analytical solutions to the incomplete information scenario. Extensive experimental results demonstrate the validity of our proposed schemes.
CYJan 10, 2022
Fusing Blockchain and AI with Metaverse: A SurveyQinglin Yang, Yetong Zhao, Huawei Huang et al.
Metaverse as the latest buzzword has attracted great attention from both industry and academia. Metaverse seamlessly integrates the real world with the virtual world and allows avatars to carry out rich activities including creation, display, entertainment, social networking, and trading. Thus, it is promising to build an exciting digital world and to transform a better physical world through the exploration of the metaverse. In this survey, we dive into the metaverse by discussing how Blockchain and Artificial Intelligence (AI) fuse with it through investigating the state-of-the-art studies across the metaverse components, digital currencies, AI applications in the virtual world, and blockchain-empowered technologies. Further exploitation and interdisciplinary research on the fusion of AI and Blockchain towards metaverse will definitely require collaboration from both academia and industries. We wish that our survey can help researchers, engineers, and educators build an open, fair, and rational future metaverse.
LGJan 3, 2022
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of DronesZhe Zhang, Shiyao Ma, Zhaohui Yang et al.
Air access networks have been recognized as a significant driver of various Internet of Things (IoT) services and applications. In particular, the aerial computing network infrastructure centered on the Internet of Drones has set off a new revolution in automatic image recognition. This emerging technology relies on sharing ground truth labeled data between Unmanned Aerial Vehicle (UAV) swarms to train a high-quality automatic image recognition model. However, such an approach will bring data privacy and data availability challenges. To address these issues, we first present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. Specifically, we propose model parameters mixing strategy to improve the naive combination of FL and semi-supervised learning methods under two realistic scenarios (labels-at-client and labels-at-server), which is referred to as Federated Mixing (FedMix). Furthermore, there are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules in different environments, i.e., statistical heterogeneity. To alleviate the statistical heterogeneity problem, we propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule, which can adjust the weight of the corresponding local model according to its frequency. Numerical results demonstrate that the performance of our proposed method is significantly better than those of the current baseline and is robust to different non-IID levels of client data.
DCAug 12, 2021
A Contract Theory based Incentive Mechanism for Federated LearningMengmeng Tian, Yuxin Chen, Yuan Liu et al.
Federated learning (FL) serves as a data privacy-preserved machine learning paradigm, and realizes the collaborative model trained by distributed clients. To accomplish an FL task, the task publisher needs to pay financial incentives to the FL server and FL server offloads the task to the contributing FL clients. It is challenging to design proper incentives for the FL clients due to the fact that the task is privately trained by the clients. This paper aims to propose a contract theory based FL task training model towards minimizing incentive budget subject to clients being individually rational (IR) and incentive compatible (IC) in each FL training round. We design a two-dimensional contract model by formally defining two private types of clients, namely data quality and computation effort. To effectively aggregate the trained models, a contract-based aggregator is proposed. We analyze the feasible and optimal contract solutions to the proposed contract model. %Experimental results demonstrate that the proposed framework and contract model can effective improve the generation accuracy of FL tasks. Experimental results show that the generalization accuracy of the FL tasks can be improved by the proposed incentive mechanism where contract-based aggregation is applied.
SYDec 4, 2020
Cross-Layer Coordinated Attacks on Cyber-Physical Systems: A LQG Game Framework with Controlled ObservationsYunhan Huang, Zehui Xiong, Quanyan Zhu
This work establishes a game-theoretic framework to study cross-layer coordinated attacks on cyber-physical systems (CPSs). The attacker can interfere with the physical process and launch jamming attacks on the communication channels simultaneously. At the same time, the defender can dodge the jamming by dispensing with observations. The generic framework captures a wide variety of classic attack models on CPSs. Leveraging dynamic programming techniques, we fully characterize the Subgame Perfect Equilibrium (SPE) control strategies. We also derive the SPE observation and jamming strategies and provide efficient computational methods to compute them. The results demonstrate that the physical and cyber attacks are coordinated and depend on each other. On the one hand, the control strategies are linear in the state estimate, and the estimate error caused by jamming attacks will induce performance degradation. On the other hand, the interactions between the attacker and the defender in the physical layer significantly impact the observation and jamming strategies. Numerical examples illustrate the interactions between the defender and the attacker through their observation and jamming strategies.