Zhaohui Yang

LG
h-index83
75papers
9,250citations
Novelty45%
AI Score58

75 Papers

ITJul 19, 2022
Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

Deniz Gunduz, Zhijin Qin, Inaki Estella Aguerri et al.

Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these systems can be made cognizant of the context in which communication exchange takes place, providing avenues for novel design insights. This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations. The focus is on approaches that utilize information theory to provide the foundations, as well as the significant role of learning in semantics and task-aware communications.

AINov 2, 2022
Explainable AI over the Internet of Things (IoT): Overview, State-of-the-Art and Future Directions

Senthil Kumar Jagatheesaperumal, Quoc-Viet Pham, Rukhsana Ruby et al.

Explainable Artificial Intelligence (XAI) is transforming the field of Artificial Intelligence (AI) by enhancing the trust of end-users in machines. As the number of connected devices keeps on growing, the Internet of Things (IoT) market needs to be trustworthy for the end-users. However, existing literature still lacks a systematic and comprehensive survey work on the use of XAI for IoT. To bridge this lacking, in this paper, we address the XAI frameworks with a focus on their characteristics and support for IoT. We illustrate the widely-used XAI services for IoT applications, such as security enhancement, Internet of Medical Things (IoMT), Industrial IoT (IIoT), and Internet of City Things (IoCT). We also suggest the implementation choice of XAI models over IoT systems in these applications with appropriate examples and summarize the key inferences for future works. Moreover, we present the cutting-edge development in edge XAI structures and the support of sixth-generation (6G) communication services for IoT applications, along with key inferences. In a nutshell, this paper constitutes the first holistic compilation on the development of XAI-based frameworks tailored for the demands of future IoT use cases.

LGOct 7, 2022
Over-the-Air Split Machine Learning in Wireless MIMO Networks

Yuzhi Yang, Zhaoyang Zhang, Yuqing Tian et al.

In split machine learning (ML), different partitions of a neural network (NN) are executed by different computing nodes, requiring a large amount of communication cost. To ease communication burden, over-the-air computation (OAC) can efficiently implement all or part of the computation at the same time of communication. Based on the proposed system, the system implementation over wireless network is introduced and we provide the problem formulation. In particular, we show that the inter-layer connection in a NN of any size can be mathematically decomposed into a set of linear precoding and combining transformations over MIMO channels. Therefore, the precoding matrix at the transmitter and the combining matrix at the receiver of each MIMO link, as well as the channel matrix itself, can jointly serve as a fully connected layer of the NN. The generalization of the proposed scheme to the conventional NNs is also introduced. Finally, we extend the proposed scheme to the widely used convolutional neural networks and demonstrate its effectiveness under both the static and quasi-static memory channel conditions with comprehensive simulations. In such a split ML system, the precoding and combining matrices are regarded as trainable parameters, while MIMO channel matrix is regarded as unknown (implicit) parameters.

ITMar 9, 2023
Robust Millimeter Beamforming via Self-Supervised Hybrid Deep Learning

Fenghao Zhu, Bohao Wang, Zhaohui Yang et al.

Beamforming with large-scale antenna arrays has been widely used in recent years, which is acknowledged as an important part in 5G and incoming 6G. Thus, various techniques are leveraged to improve its performance, e.g., deep learning, advanced optimization algorithms, etc. Although its performance in many previous research scenarios with deep learning is quite attractive, usually it drops rapidly when the environment or dataset is changed. Therefore, designing effective beamforming network with strong robustness is an open issue for the intelligent wireless communications. In this paper, we propose a robust beamforming self-supervised network, and verify it in two kinds of different datasets with various scenarios. Simulation results show that the proposed self-supervised network with hybrid learning performs well in both classic DeepMIMO and new WAIR-D dataset with the strong robustness under the various environments. Also, we present the principle to explain the rationality of this kind of hybrid learning, which is instructive to apply with more kinds of datasets.

LGJan 3, 2023
Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding, Zhaohui Yang, Quoc-Viet Pham et al.

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

CRJun 6, 2023
Adversarial Attacks and Defenses for Semantic Communication in Vehicular Metaverses

Jiawen Kang, Jiayi He, Hongyang Du et al.

For vehicular metaverses, one of the ultimate user-centric goals is to optimize the immersive experience and Quality of Service (QoS) for users on board. Semantic Communication (SemCom) has been introduced as a revolutionary paradigm that significantly eases communication resource pressure for vehicular metaverse applications to achieve this goal. SemCom enables high-quality and ultra-efficient vehicular communication, even with explosively increasing data traffic among vehicles. In this article, we propose a hierarchical SemCom-enabled vehicular metaverses framework consisting of the global metaverse, local metaverses, SemCom module, and resource pool. The global and local metaverses are brand-new concepts from the metaverse's distribution standpoint. Considering the QoS of users, this article explores the potential security vulnerabilities of the proposed framework. To that purpose, this study highlights a specific security risk to the framework's SemCom module and offers a viable defense solution, so encouraging community researchers to focus more on vehicular metaverse security. Finally, we provide an overview of the open issues of secure SemCom in the vehicular metaverses, notably pointing out potential future research directions.

CRAug 25, 2022
On Differential Privacy for Federated Learning in Wireless Systems with Multiple Base Stations

Nima Tavangaran, Mingzhe Chen, Zhaohui Yang et al.

In this work, we consider a federated learning model in a wireless system with multiple base stations and inter-cell interference. We apply a differential private scheme to transmit information from users to their corresponding base station during the learning phase. We show the convergence behavior of the learning process by deriving an upper bound on its optimality gap. Furthermore, we define an optimization problem to reduce this upper bound and the total privacy leakage. To find the locally optimal solutions of this problem, we first propose an algorithm that schedules the resource blocks and users. We then extend this scheme to reduce the total privacy leakage by optimizing the differential privacy artificial noise. We apply the solutions of these two procedures as parameters of a federated learning system. In this setting, we assume that each user is equipped with a classifier. Moreover, the communication cells are assumed to have mostly fewer resource blocks than numbers of users. The simulation results show that our proposed scheduler improves the average accuracy of the predictions compared with a random scheduler. Furthermore, its extended version with noise optimizer significantly reduces the amount of privacy leakage.

LGAug 26, 2024
Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

Yahao Ding, Wen Shang, Minrui Xu et al.

The Metaverse, a burgeoning collective virtual space merging augmented reality and persistent virtual worlds, necessitates advanced artificial intelligence (AI) and communication technologies to support immersive and interactive experiences. Federated learning (FL) has emerged as a promising technique for collaboratively training AI models while preserving data privacy. However, FL faces challenges such as high communication overhead and substantial computational demands, particularly for neural network (NN) models. To address these issues, we propose an integrated federated split learning and hyperdimensional computing (FSL-HDC) framework for emerging foundation models. This novel approach reduces communication costs, computation load, and privacy risks, making it particularly suitable for resource-constrained edge devices in the Metaverse, ensuring real-time responsive interactions. Additionally, we introduce an optimization algorithm that concurrently optimizes transmission power and bandwidth to minimize the maximum transmission time among all users to the server. The simulation results based on the MNIST dataset indicate that FSL-HDC achieves an accuracy rate of approximately 87.5%, which is slightly lower than that of FL-HDC. However, FSL-HDC exhibits a significantly faster convergence speed, approximately 3.733x that of FSL-NN, and demonstrates robustness to non-IID data distributions. Moreover, our proposed optimization algorithm can reduce the maximum transmission time by up to 64% compared with the baseline.

ITApr 7
Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

Fenghao Zhu, Xinquan Wang, Siming Jiang et al.

The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is a wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and decision-making. In light of these remarkable capabilities, this paper provides a comprehensive survey of WLAM, explaining its fundamental principles, diverse applications, critical challenges, and future research opportunities. We begin by introducing the background of WLAM and analyzing the key synergies with wireless networks, emphasizing the mutual benefits. Subsequently, we explore the foundational characteristics of WLAM, delving into their unique relevance in wireless environments. Then, the role of WLAM in optimizing wireless communication systems across various use cases and the reciprocal benefits are systematically investigated. Furthermore, we discuss the integration of WLAM with emerging technologies, highlighting their potential to enable transformative capabilities and breakthroughs in wireless communication. Finally, we thoroughly examine the high-level challenges and discuss pivotal future research directions.

ITMay 1
Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey

Qianzhou Chen, Siqi Sun, Minrui Xu et al.

The rapid advancements in foundation models and sixth-generation (6G) wireless communication systems necessitate the development of efficient, scalable, and privacy-preserving machine learning approaches. For foundation models in 6G, split learning (SL) and aggregation learning (AL) have emerged as promising paradigms that address key challenges in distributed artificial intelligence (AI), such as communication efficiency, resource allocation, and data privacy. SL enables multiple entities to collaboratively train deep learning models by partitioning neural networks, while AL focuses on aggregating intermediate results or model updates from multiple participants, improving robustness, optimizing resource utilization, and mitigating data leakage risks. Specifically, SL is ideal for scenarios requiring strict data isolation (e.g., vertical collaborations), whereas AL suits homogeneous horizontal data settings; they can be combined to balance privacy and communication efficiency. This survey provides a comprehensive analysis of SL and AL in 6G communication systems, exploring their architectures, technical methodologies, and integration with AI-native 6G communication technologies. We examine different SL configurations, aggregation techniques, and their roles in optimizing distributed foundation models. Furthermore, we discuss their applications in emerging wireless networks, including semantic communication, reconfigurable intelligent surfaces (RIS), space-air-ground integrated networks (SAGINs), and quantum communication. By analyzing the impact of SL and AL, this survey provides insights into their role in shaping distributed AI-driven communication systems in the 6G era, focusing on efficiency, privacy preservation, and scalability.

CVNov 29, 2022
TF-Net: Deep Learning Empowered Tiny Feature Network for Night-time UAV Detection

Maham Misbah, Misha Urooj Khan, Zhaohui Yang et al.

Technological advancements have normalized the usage of unmanned aerial vehicles (UAVs) in every sector, spanning from military to commercial but they also pose serious security concerns due to their enhanced functionalities and easy access to private and highly secured areas. Several instances related to UAVs have raised security concerns, leading to UAV detection research studies. Visual techniques are widely adopted for UAV detection, but they perform poorly at night, in complex backgrounds, and in adverse weather conditions. Therefore, a robust night vision-based drone detection system is required to that could efficiently tackle this problem. Infrared cameras are increasingly used for nighttime surveillance due to their wide applications in night vision equipment. This paper uses a deep learning-based TinyFeatureNet (TF-Net), which is an improved version of YOLOv5s, to accurately detect UAVs during the night using infrared (IR) images. In the proposed TF-Net, we introduce architectural changes in the neck and backbone of the YOLOv5s. We also simulated four different YOLOv5 models (s,m,n,l) and proposed TF-Net for a fair comparison. The results showed better performance for the proposed TF-Net in terms of precision, IoU, GFLOPS, model size, and FPS compared to the YOLOv5s. TF-Net yielded the best results with 95.7\% precision, 84\% mAp, and 44.8\% $IoU$.

CLSep 16, 2023
Semantic Information Extraction for Text Data with Probability Graph

Zhouxiang Zhao, Zhaohui Yang, Ye Hu et al.

In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. Then, the extracted semantic information is captured in a knowledge graph. An additional probability dimension is introduced in this graph to capture the importance of each information. This semantic information extraction problem is posed as an optimization framework whose goal is to extract most important semantic information for transmission. To find an optimal solution for this problem, a Floyd's algorithm based solution coupled with an efficient sorting mechanism is proposed. Numerical results testify the effectiveness of the proposed algorithm with regards to two novel performance metrics including semantic uncertainty and semantic similarity.

LGSep 26, 2023
Distortion Resilience for Goal-Oriented Semantic Communication

Minh-Duong Nguyen, Quang-Vinh Do, Zhaohui Yang et al.

Recent research efforts on Semantic Communication (SemCom) have mostly considered accuracy as a main problem for optimizing goal-oriented communication systems. However, these approaches introduce a paradox: the accuracy of Artificial Intelligence (AI) tasks should naturally emerge through training rather than being dictated by network constraints. Acknowledging this dilemma, this work introduces an innovative approach that leverages the rate distortion theory to analyze distortions induced by communication and compression, thereby analyzing the learning process. Specifically, we examine the distribution shift between the original data and the distorted data, thus assessing its impact on the AI model's performance. Founding upon this analysis, we can preemptively estimate the empirical accuracy of AI tasks, making the goal-oriented SemCom problem feasible. To achieve this objective, we present the theoretical foundation of our approach, accompanied by simulations and experiments that demonstrate its effectiveness. The experimental results indicate that our proposed method enables accurate AI task performance while adhering to network constraints, establishing it as a valuable contribution to the field of signal processing. Furthermore, this work advances research in goal-oriented SemCom and highlights the significance of data-driven approaches in optimizing the performance of intelligent systems.

NIJul 12, 2024
FedsLLM: Federated Split Learning for Large Language Models over Communication Networks

Kai Zhao, Zhaohui Yang, Chongwen Huang et al.

Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into client subnetworks and server subnetworks. It leverages a federated server to aggregate and update client models. As the training data are transmitted through a wireless network between clients and both main and federated servers, the training delay is determined by the learning accuracy and the allocation of communication bandwidth. This paper models the minimization of the training delay by integrating computation and communication optimization, simplifying the optimization problem into a convex problem to find the optimal solution. Additionally, it presents a lemma that describes the precise solutions to this problem. Simulation results demonstrate that the proposed optimization algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.

LGFeb 23
A Secure and Private Distributed Bayesian Federated Learning Design

Nuocheng Yang, Sihua Wang, Zhaohui Yang et al.

Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server. However, DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy. To address these issues, we propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration. Within this framework, each device trains a local model using a Bayesian approach and independently selects an optimal subset of neighbors for posterior exchange. We formulate this neighbor selection as an optimization problem to minimize the global loss function under security and privacy constraints. Solving this problem is challenging because devices only possess partial network information, and the complex coupling between topology, security, and convergence remains unclear. To bridge this gap, we first analytically characterize the trade-offs between dynamic connectivity, Byzantine detection, privacy levels, and convergence speed. Leveraging these insights, we develop a fully distributed Graph Neural Network (GNN)-based Reinforcement Learning (RL) algorithm. This approach enables devices to make autonomous connection decisions based on local observations. Simulation results demonstrate that our method achieves superior robustness and efficiency with significantly lower overhead compared to traditional security and privacy schemes.

LGNov 22, 2023
A Joint Gradient and Loss Based Clustered Federated Learning Design

Licheng Lin, Mingzhe Chen, Zhaohui Yang et al.

In this paper, a novel clustered FL framework that enables distributed edge devices with non-IID data to independently form several clusters in a distributed manner and implement FL training within each cluster is proposed. In particular, our designed clustered FL algorithm must overcome two challenges associated with FL training. First, the server has limited FL training information (i.e., the parameter server can only obtain the FL model information of each device) and limited computational power for finding the differences among a large amount of devices. Second, each device does not have the data information of other devices for device clustering and can only use global FL model parameters received from the server and its data information to determine its cluster identity, which will increase the difficulty of device clustering. To overcome these two challenges, we propose a joint gradient and loss based distributed clustering method in which each device determines its cluster identity considering the gradient similarity and training loss. The proposed clustering method not only considers how a local FL model of one device contributes to each cluster but also the direction of gradient descent thus improving clustering speed. By delegating clustering decisions to edge devices, each device can fully leverage its private data information to determine its own cluster identity, thereby reducing clustering overhead and improving overall clustering performance. Simulation results demonstrate that our proposed clustered FL algorithm can reduce clustering iterations by up to 99% compared to the existing baseline.

ITApr 30, 2023
Self-information Domain-based Neural CSI Compression with Feature Coupling

Ziqing Yin, Renjie Xie, Wei Xu et al.

Deep learning (DL)-based channel state information (CSI) feedback methods compressed the CSI matrix by exploiting its delay and angle features straightforwardly, while the measure in terms of information contained in the CSI matrix has rarely been considered. Based on this observation, we introduce self-information as an informative CSI representation from the perspective of information theory, which reflects the amount of information of the original CSI matrix in an explicit way. Then, a novel DL-based network is proposed for temporal CSI compression in the self-information domain, namely SD-CsiNet. The proposed SD-CsiNet projects the raw CSI onto a self-information matrix in the newly-defined self-information domain, extracts both temporal and spatial features of the self-information matrix, and then couples these two features for effective compression. Experimental results verify the effectiveness of the proposed SD-CsiNet by exploiting the self-information of CSI. Particularly for compression ratios 1/8 and 1/16, the SD-CsiNet respectively achieves 7.17 dB and 3.68 dB performance gains compared to state-of-the-art methods.

LGMar 11
Prioritizing Gradient Sign Over Modulus: An Importance-Aware Framework for Wireless Federated Learning

Yiyang Yue, Jiacheng Yao, Wei Xu et al.

Wireless federated learning (FL) facilitates collaborative training of artificial intelligence (AI) models to support ubiquitous intelligent applications at the wireless edge. However, the inherent constraints of limited wireless resources inevitably lead to unreliable communication, which poses a significant challenge to wireless FL. To overcome this challenge, we propose Sign-Prioritized FL (SP-FL), a novel framework that improves wireless FL by prioritizing the transmission of important gradient information through uneven resource allocation. Specifically, recognizing the importance of descent direction in model updating, we transmit gradient signs in individual packets and allow their reuse for gradient descent if the remaining gradient modulus cannot be correctly recovered. To further improve the reliability of transmission of important information, we formulate a hierarchical resource allocation problem based on the importance disparity at both the packet and device levels, optimizing bandwidth allocation across multiple devices and power allocation between sign and modulus packets. To make the problem tractable, the one-step convergence behavior of SP-FL, which characterizes data importance at both levels in an explicit form, is analyzed. We then propose an alternating optimization algorithm to solve this problem using the Newton-Raphson method and successive convex approximation (SCA). Simulation results confirm the superiority of SP-FL, especially in resource-constrained scenarios, demonstrating up to 9.96\% higher testing accuracy on the CIFAR-10 dataset compared to existing methods.

SYApr 19
WirelessAgent: A Unified Agent Design for General Wireless Resource Allocation Problem without Current Channel State Information

Ran Yi, Ruopeng Xu, Dongshu Zhao et al.

This paper investigates the agent design for solving the wireless resource allocation problem without sufficient channel state information (CSI), which cannot be effectively solved via conventional method. In the considered wireless agent design, we provide the general sense-repair-decide-act workflow, which can be used to intelligently solve general wireless resource allocation problem. A multi-objective optimization problem is formulated to adaptively satisfy different user requirements including both spectrum and energy efficiency. This work addresses the challenge of incomplete CSI for multiple optimization objectives. To solve this problem, we use an artificial intelligence (AI) model to predict missing channel data and construct an agent on the Coze platform, allowing the network operators to optimize multiple objectives through natural language conversations. To tackle the resource scheduling under different objectives, we develop adaptive algorithms. Simulation results validate the effectiveness of our proposed design, demonstrating that the proposed AI method reduces the root mean square error by approximately up to 67\% compared to the traditional approach. Moreover, the data-driven scheduling balances system performance compared to conventional baseline approaches.

QUANT-PHFeb 14
Reconfigurable Quantum Instruction Set Computers for High Performance Attainable on Hardware

Zhaohui Yang, Dawei Ding, Qi Ye et al.

The performance of current quantum hardware is severely limited. While expanding the quantum ISA with high-fidelity, expressive basis gates is a key path forward, it imposes significant gate calibration overhead and complicates compiler optimization. As a result, even though more powerful ISAs have been designed, their use remains largely conceptual rather than practical. To move beyond these hurdles, we introduce the concept of "reconfigurable quantum instruction set computers" (ReQISC), which incorporates: (1) a unified microarchitecture capable of directly implementing arbitrary 2Q gates equivalently, i.e., SU(4) modulo 1Q rotations, with theoretically optimal gate durations given any 2Q coupling Hamiltonians; (2) a compilation framework tailored to ReQISC primitives for end-to-end synthesis and optimization, comprising a program-aware pass that refines high-level representations, a program-agnostic pass for aggressive circuit-level optimization, and an SU(4)-aware routing pass that minimizes hardware mapping overhead. We detail the hardware implementation to demonstrate the feasibility, in terms of both pulse control and calibration of this superior gate scheme on realistic hardware. By leveraging the expressivity of SU(4) and the time minimality realized by the underlying microarchitecture, the SU(4)-based ISA achieves remarkable performance, with a 4.97-fold reduction in average pulse duration to implement arbitrary 2Q gates, compared to the usual CNOT/CZ scheme on mainstream flux-tunable transmons. Supported by the end-to-end compiler, ReQISC outperforms the conventional CNOT-ISA, SOTA compiler, and pulse implementation counterparts, in significantly reducing 2Q gate counts, circuit depth, pulse duration, qubit mapping overhead, and program fidelity losses. For the first time, ReQISC makes the theoretical benefits of continuous ISAs practically feasible.

ROSep 11, 2025Code
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Haozhan Li, Yuxin Zuo, Jiale Yu et al. · pku, tsinghua

Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale human-operated robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks involving distribution shift. Recent breakthroughs in Large Reasoning Models (LRMs) demonstrate that reinforcement learning (RL) can dramatically enhance step-by-step reasoning capabilities, raising a natural question: Can RL similarly improve the long-horizon step-by-step action planning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RL framework tailored for VLA models. Building upon veRL, we introduce VLA-specific trajectory sampling, scalable parallelization, multi-environment rendering, and optimized loss computation. When applied to OpenVLA-OFT, SimpleVLA-RL achieves SoTA performance on LIBERO and even outperforms $π_0$ on RoboTwin 1.0\&2.0 with the exploration-enhancing strategies we introduce. SimpleVLA-RL not only reduces dependence on large-scale data and enables robust generalization, but also remarkably surpasses SFT in real-world tasks. Moreover, we identify a novel phenomenon ``pushcut'' during RL training, wherein the policy discovers previously unseen patterns beyond those seen in the previous training process. Github: https://github.com/PRIME-RL/SimpleVLA-RL

ITMay 14
Digital Twin Synchronization Over Mobile Embodied AI Network With Agentic Intelligence

Zhouxiang Zhao, Jiaxiang Wang, Yahao Ding et al.

Efficient digital twin (DT) synchronization relies on maintaining high-fidelity virtual representations with minimal age of information (AoI). However, the synergistic potential of cooperative sensing and autonomous mobility of the sensing agent remains underexplored in existing DT synchronization frameworks. In this paper, we propose an agentic AI-empowered mobile embodied AI network (MEAN) framework for DT synchronization. In the proposed hybrid architecture, the base station (BS) conducts global orchestration, while the agents autonomously execute a five-stage closed-loop workflow: move-to-sense, cooperative sensing, onboard semantic processing, channel-aware mobility, and uplink transmission. To optimize synchronization performance, we formulate a joint topology dispatching and multidimensional resource allocation problem aimed at minimizing the maximum twin deviation across regions, subject to heterogeneous sensing fidelity and energy budget constraints. To tackle this, we develop a hierarchical two-layer optimization algorithm, where the outer-layer refines multi-agent assignment via a dynamic matching game, and the inner-layer iteratively optimizes the continuous resources. Extensive simulation results verify the convergence of the proposed algorithm and demonstrate its substantial superiority over multiple baseline schemes in reducing synchronization deviation. Furthermore, the results reveal that semantic compression serves as a vital substitute for channel resources in latency reduction under constrained bandwidth, while autonomous velocity adaptation provides an essential degree of freedom for the system to navigate the fundamental energy-time trade-off.

LGApr 9, 2025Code
Analogical Learning for Cross-Scenario Generalization: Framework and Application to Intelligent Localization

Zirui Chen, Zhaoyang Zhang, Ziqing Xing et al.

Existing learning models often exhibit poor generalization when deployed across diverse scenarios. It is primarily due to that the underlying reference frame of the data varies with the deployment environment and settings. However, despite that data of each scenario has a distinct reference frame, its generation generally follows common underlying physical rules. Based on this understanding, this article proposes a deep learning framework named analogical learning (AL), which implicitly retrieves the reference frame information associated with a scenario and then to make accurate prediction by relative analogy with other scenarios. Specifically, we design a bipartite neural network called Mateformer. Its first part captures the relativity within multiple latent feature spaces between the input data and a small amount of embedded data from the studied scenario, while its second part uses this relativity to guide the nonlinear analogy. We apply AL to the typical multi-scenario learning problem of intelligent wireless localization in cellular networks. Extensive experiments validate AL's superiority across three key dimensions. First, it achieves state-of-the-art accuracy in single-scenario benchmarks. Second, it demonstrates stable transferability between different scenarios, avoiding catastrophic forgetting. Finally, and most importantly, it robustly adapts to new, unseen scenarios--including dynamic weather and traffic conditions--without any tuning. All data and code are available at https://github.com/ziruichen-research/ALLoc.

ITJan 15
Codebook Design for Limited Feedback in Near-Field XL-MIMO Systems

Liujia Yao, Changsheng You, Zixuan Huang et al.

In this paper, we study efficient codebook design for limited feedback in extremely large-scale multiple-input-multiple-output (XL-MIMO) frequency division duplexing (FDD) systems. It is worth noting that existing codebook designs for XL-MIMO, such as polar-domain codebook, have not well taken into account user (location) distribution in practice, thereby incurring excessive feedback overhead. To address this issue, we propose in this paper a novel and efficient feedback codebook tailored to user distribution. To this end, we first consider a typical scenario where users are uniformly distributed within a specific polar-region, based on which a sum-rate maximization problem is formulated to jointly optimize angle-range samples and bit allocation among angle/range feedback. This problem is challenging to solve due to the lack of a closed-form expression for the received power in terms of angle and range samples. By leveraging a Voronoi partitioning approach, we show that uniform angle sampling is optimal for received power maximization. For more challenging range sampling design, we obtain a tight lower-bound on the received power and show that geometric sampling, where the ratio between adjacent samples is constant, can maximize the lower bound and thus serves as a high-quality suboptimal solution. We then extend the proposed framework to accommodate more general non-uniform user distribution via an alternating sampling method. Furthermore, theoretical analysis reveals that as the array size increases, the optimal allocation of feedback bits increasingly favors range samples at the expense of angle samples. Finally, numerical results validate the superior rate performance and robustness of the proposed codebook design under various system setups, achieving significant gains over benchmark schemes, including the widely used polar-domain codebook.

AIMay 20, 2025Code
Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning

Zhaohui Yang, Chenghua He, Xiaowen Shi et al.

Many studies focus on data annotation techniques for training effective PRMs. However, current methods encounter a significant issue when applied to long CoT reasoning processes: they tend to focus solely on the first incorrect step and all preceding steps, assuming that all subsequent steps are incorrect. These methods overlook the unique self-correction and reflection mechanisms inherent in long CoT, where correct reasoning steps may still occur after initial reasoning mistakes. To address this issue, we propose a novel data annotation method for PRMs specifically designed to score the long CoT reasoning process. Given that under the reflection pattern, correct and incorrect steps often alternate, we introduce the concepts of Error Propagation and Error Cessation, enhancing PRMs' ability to identify both effective self-correction behaviors and reasoning based on erroneous steps. Leveraging an LLM-based judger for annotation, we collect 1.7 million data samples to train a 7B PRM and evaluate it at both solution and step levels. Experimental results demonstrate that compared to existing open-source PRMs and PRMs trained on open-source datasets, our PRM achieves superior performance across various metrics, including search guidance, BoN, and F1 scores. Compared to widely used MC-based annotation methods, our annotation approach not only achieves higher data efficiency but also delivers superior performance. Detailed analysis is also conducted to demonstrate the stability and generalizability of our method.

CVMar 26, 2020Code
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

Jianyuan Guo, Kai Han, Yunhe Wang et al.

Neural Architecture Search (NAS) has achieved great success in image classification task. Some recent works have managed to explore the automatic design of efficient backbone or feature fusion layer for object detection. However, these methods focus on searching only one certain component of object detector while leaving others manually designed. We identify the inconsistency between searched component and manually designed ones would withhold the detector of stronger performance. To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i.e. backbone, neck, and head) of object detector in an end-to-end manner. In addition, we empirically reveal that different parts of the detector prefer different operators. Motivated by this, we employ a novel scheme to automatically screen different sub search spaces for different components so as to perform the end-to-end search for each component on the corresponding sub search space efficiently. Without bells and whistles, our searched architecture, namely Hit-Detector, achieves 41.4\% mAP on COCO minival set with 27M parameters. Our implementation is available at https://github.com/ggjy/HitDet.pytorch.

QUANT-PHMar 23
Optimal Compilation of Syndrome Extraction Circuits for General Quantum LDPC Codes

Kai Zhang, Dingchao Gao, Zhaohui Yang et al.

Quantum error correcting codes (QECC) are essential for constructing large-scale quantum computers that deliver faithful results. As strong competitors to the conventional surface code, quantum low-density parity-check (qLDPC) codes are emerging rapidly: they offer high encoding rates while maintaining reasonable physical-qubit connectivity requirements. Despite the existence of numerous code constructions, a notable gap persists between these designs -- some of which remain purely theoretical -- and their circuit-level deployment. In this work, we propose Auto-Stabilizer-Check (ASC), a universal compilation framework that generates depth-optimal syndrome extraction circuits for arbitrary qLDPC codes. ASC leverages the sparsity of parity-check matrices and exploits the commutativity of X and Z stabilizer measurement subroutines to search for optimal compilation schemes. By iteratively invoking an SMT solver, ASC returns a depth-optimal solution if a satisfying assignment is found, and a near-optimal solution in cases of solver timeouts. Notably, ASC provides the first definitive answer to one of IBM's open problems: for all instances of bivariate bicycle (BB) code reported in their work, our compiler certifies that no depth-6 syndrome extraction circuit exists. Furthermore, by integrating ASC with an end-to-end evaluation framework -- one that assesses different compilation settings under a circuit-level noise model -- ASC reduces circuit depth by approximately 50% and achieves an average 7x-8x suppression of the logical error rate for general qLDPC codes, compared with as-soon-as-possible (ASAP) and coloration-based scheduling. ASC thus substantially reduces manual design overhead and demonstrates its strong potential to serve as a key component in accelerating hardware deployment of qLDPC codes.

QUANT-PHJul 13, 2025
PHOENIX: Pauli-Based High-Level Optimization Engine for Instruction Execution on NISQ Devices

Zhaohui Yang, Dawei Ding, Chenghong Zhu et al.

Variational quantum algorithms (VQA) based on Hamiltonian simulation represent a specialized class of quantum programs well-suited for near-term quantum computing applications due to its modest resource requirements in terms of qubits and circuit depth. Unlike the conventional single-qubit (1Q) and two-qubit (2Q) gate sequence representation, Hamiltonian simulation programs are essentially composed of disciplined subroutines known as Pauli exponentiations (Pauli strings with coefficients) that are variably arranged. To capitalize on these distinct program features, this study introduces PHOENIX, a highly effective compilation framework that primarily operates at the high-level Pauli-based intermediate representation (IR) for generic Hamiltonian simulation programs. PHOENIX exploits global program optimization opportunities to the greatest extent, compared to existing SOTA methods despite some of them also utilizing similar IRs. Experimental results demonstrate that PHOENIX outperforms SOTA VQA compilers across diverse program categories, backend ISAs, and hardware topologies.

NIMar 6, 2025
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi et al.

This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.

ITDec 3, 2024
On Privacy, Security, and Trustworthiness in Distributed Wireless Large AI Models (WLAM)

Zhaohui Yang, Wei Xu, Le Liang et al.

Combining wireless communication with large artificial intelligence (AI) models can open up a myriad of novel application scenarios. In sixth generation (6G) networks, ubiquitous communication and computing resources allow large AI models to serve democratic large AI models-related services to enable real-time applications like autonomous vehicles, smart cities, and Internet of Things (IoT) ecosystems. However, the security considerations and sustainable communication resources limit the deployment of large AI models over distributed wireless networks. This paper provides a comprehensive overview of privacy, security, and trustworthy for distributed wireless large AI model (WLAM). In particular, a detailed privacy and security are analysis for distributed WLAM is fist revealed. The classifications and theoretical findings about privacy and security in distributed WLAM are discussed. Then the trustworthy and ethics for implementing distributed WLAM are described. Finally, the comprehensive applications of distributed WLAM are presented in the context of electromagnetic signal processing.

NIApr 22, 2024
Mapping Wireless Networks into Digital Reality through Joint Vertical and Horizontal Learning

Zifan Zhang, Mingzhe Chen, Zhaohui Yang et al.

In recent years, the complexity of 5G and beyond wireless networks has escalated, prompting a need for innovative frameworks to facilitate flexible management and efficient deployment. The concept of digital twins (DTs) has emerged as a solution to enable real-time monitoring, predictive configurations, and decision-making processes. While existing works primarily focus on leveraging DTs to optimize wireless networks, a detailed mapping methodology for creating virtual representations of network infrastructure and properties is still lacking. In this context, we introduce VH-Twin, a novel time-series data-driven framework that effectively maps wireless networks into digital reality. VH-Twin distinguishes itself through complementary vertical twinning (V-twinning) and horizontal twinning (H-twinning) stages, followed by a periodic clustering mechanism used to virtualize network regions based on their distinct geological and wireless characteristics. Specifically, V-twinning exploits distributed learning techniques to initialize a global twin model collaboratively from virtualized network clusters. H-twinning, on the other hand, is implemented with an asynchronous mapping scheme that dynamically updates twin models in response to network or environmental changes. Leveraging real-world wireless traffic data within a cellular wireless network, comprehensive experiments are conducted to verify that VH-Twin can effectively construct, deploy, and maintain network DTs. Parametric analysis also offers insights into how to strike a balance between twinning efficiency and model accuracy at scale.

ITMar 24, 2025
Byzantine-Resilient Over-the-Air Federated Learning under Zero-Trust Architecture

Jiacheng Yao, Wei Shi, Wei Xu et al.

Over-the-air computation (AirComp) has emerged as an essential approach for enabling communication-efficient federated learning (FL) over wireless networks. Nonetheless, the inherent analog transmission mechanism in AirComp-based FL (AirFL) intensifies challenges posed by potential Byzantine attacks. In this paper, we propose a novel Byzantine-robust FL paradigm for over-the-air transmissions, referred to as federated learning with secure adaptive clustering (FedSAC). FedSAC aims to protect a portion of the devices from attacks through zero trust architecture (ZTA) based Byzantine identification and adaptive device clustering. By conducting a one-step convergence analysis, we theoretically characterize the convergence behavior with different device clustering mechanisms and uneven aggregation weighting factors for each device. Building upon our analytical results, we formulate a joint optimization problem for the clustering and weighting factors in each communication round. To facilitate the targeted optimization, we propose a dynamic Byzantine identification method using historical reputation based on ZTA. Furthermore, we introduce a sequential clustering method, transforming the joint optimization into a weighting optimization problem without sacrificing the optimality. To optimize the weighting, we capitalize on the penalty convex-concave procedure (P-CCP) to obtain a stationary solution. Numerical results substantiate the superiority of the proposed FedSAC over existing methods in terms of both test accuracy and convergence rate.

ROJun 29, 2025
Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

Tianxing Chen, Kaixuan Wang, Zhaohui Yang et al.

Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To advance this goal, we launched the RoboTwin Dual-Arm Collaboration Challenge at the 2nd MEIS Workshop, CVPR 2025. Built on the RoboTwin Simulation platform (1.0 and 2.0) and the AgileX COBOT-Magic Robot platform, the competition consisted of three stages: Simulation Round 1, Simulation Round 2, and a final Real-World Round. Participants totally tackled 17 dual-arm manipulation tasks, covering rigid, deformable, and tactile-based scenarios. The challenge attracted 64 global teams and over 400 participants, producing top-performing solutions like SEM and AnchorDP3 and generating valuable insights into generalizable bimanual policy learning. This report outlines the competition setup, task design, evaluation methodology, key findings and future direction, aiming to support future research on robust and generalizable bimanual manipulation policies. The Challenge Webpage is available at https://robotwin-benchmark.github.io/cvpr-2025-challenge/.

AIMay 20, 2025
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning

Zhaohui Yang, Yuxiao Ye, Shilei Jiang et al.

Recent advances in reasoning language models have witnessed a paradigm shift from short to long CoT pattern. Given the substantial computational cost of rollouts in long CoT models, maximizing the utility of fixed training datasets becomes crucial. Our analysis reveals that negative responses contain valuable components such as self-reflection and error-correction steps, yet primary existing methods either completely discard negative samples (RFT) or apply equal penalization across all tokens (RL), failing to leverage these potential learning signals. In light of this, we propose Behavior Constrained Policy Gradient with Negative Sample Augmentation (BCPG-NSA), a fine-grained offline RL framework that encompasses three stages: 1) sample segmentation, 2) consensus-based step correctness assessment combining LLM and PRM judgers, and 3) policy optimization with NSA designed to effectively mine positive steps within negative samples. Experimental results show that BCPG-NSA outperforms baselines on several challenging math/coding reasoning benchmarks using the same training dataset, achieving improved sample efficiency and demonstrating robustness and scalability when extended to multiple iterations.

NIApr 1
Agentic AI-Empowered Wireless Agent Networks With Semantic-Aware Collaboration via ILAC

Zhouxiang Zhao, Jiaxiang Wang, Zhaohui Yang et al.

The rapid development of agentic artificial intelligence (AI) is driving future wireless networks to evolve from passive data pipes into intelligent collaborative ecosystems under the emerging paradigm of integrated learning and communication (ILAC). However, realizing efficient agentic collaboration faces challenges not only in handling semantic redundancy but also in the lack of an integrated mechanism for communication, computation, and control. To address this, we propose a wireless agent network (WAN) framework that orchestrates a progressive knowledge aggregation mechanism. Specifically, we formulate the aggregation process as a joint energy minimization problem where the agents perform semantic compression to eliminate redundancy, optimize transmission power to deliver semantic payloads, and adjust physical trajectories to proactively enhance channel qualities. To solve this problem, we develop a hierarchical algorithm that integrates inner-level resource optimization with outer-level topology evolution. Theoretically, we reveal that incorporating a potential field into the topology evolution effectively overcomes the short-sightedness of greedy matching, providing a mathematically rigorous heuristic for long-term energy minimization. Simulation results demonstrate that the proposed framework achieves superior energy efficiency and scalability compared to conventional benchmarks, validating the efficacy of semantic-aware collaboration in dynamic environments.

NIFeb 7, 2025
Optimizing Wireless Resource Management and Synchronization in Digital Twin Networks

Hanzhi Yu, Yuchen Liu, Zhaohui Yang et al.

In this paper, we investigate an accurate synchronization between a physical network and its digital network twin (DNT), which serves as a virtual representation of the physical network. The considered network includes a set of base stations (BSs) that must allocate its limited spectrum resources to serve a set of users while also transmitting its partially observed physical network information to a cloud server to generate the DNT. Since the DNT can predict the physical network status based on its historical status, the BSs may not need to send their physical network information at each time slot, allowing them to conserve spectrum resources to serve the users. However, if the DNT does not receive the physical network information of the BSs over a large time period, the DNT's accuracy in representing the physical network may degrade. To this end, each BS must decide when to send the physical network information to the cloud server to update the DNT, while also determining the spectrum resource allocation policy for both DNT synchronization and serving the users. We formulate this resource allocation task as an optimization problem, aiming to maximize the total data rate of all users while minimizing the asynchronization between the physical network and the DNT. To address this problem, we propose a method based on the GRUs and the value decomposition network (VDN). Simulation results show that our GRU and VDN based algorithm improves the weighted sum of data rates and the similarity between the status of the DNT and the physical network by up to 28.96%, compared to a baseline method combining GRU with the independent Q learning.

ITFeb 15, 2024
Digital versus Analog Transmissions for Federated Learning over Wireless Networks

Jiacheng Yao, Wei Xu, Zhaohui Yang et al.

In this paper, we quantitatively compare these two effective communication schemes, i.e., digital and analog ones, for wireless federated learning (FL) over resource-constrained networks, highlighting their essential differences as well as their respective application scenarios. We first examine both digital and analog transmission methods, together with a unified and fair comparison scheme under practical constraints. A universal convergence analysis under various imperfections is established for FL performance evaluation in wireless networks. These analytical results reveal that the fundamental difference between the two paradigms lies in whether communication and computation are jointly designed or not. The digital schemes decouple the communication design from specific FL tasks, making it difficult to support simultaneous uplink transmission of massive devices with limited bandwidth. In contrast, the analog communication allows over-the-air computation (AirComp), thus achieving efficient spectrum utilization. However, computation-oriented analog transmission reduces power efficiency, and its performance is sensitive to computational errors. Finally, numerical simulations are conducted to verify these theoretical observations.

LGMay 19, 2025
Confidence-Regulated Generative Diffusion Models for Reliable AI Agent Migration in Vehicular Metaverses

Yingkai Kang, Jiawen Kang, Jinbo Wen et al.

Vehicular metaverses are an emerging paradigm that merges intelligent transportation systems with virtual spaces, leveraging advanced digital twin and Artificial Intelligence (AI) technologies to seamlessly integrate vehicles, users, and digital environments. In this paradigm, vehicular AI agents are endowed with environment perception, decision-making, and action execution capabilities, enabling real-time processing and analysis of multi-modal data to provide users with customized interactive services. Since vehicular AI agents require substantial resources for real-time decision-making, given vehicle mobility and network dynamics conditions, the AI agents are deployed in RoadSide Units (RSUs) with sufficient resources and dynamically migrated among them. However, AI agent migration requires frequent data exchanges, which may expose vehicular metaverses to potential cyber attacks. To this end, we propose a reliable vehicular AI agent migration framework, achieving reliable dynamic migration and efficient resource scheduling through cooperation between vehicles and RSUs. Additionally, we design a trust evaluation model based on the theory of planned behavior to dynamically quantify the reputation of RSUs, thereby better accommodating the personalized trust preferences of users. We then model the vehicular AI agent migration process as a partially observable markov decision process and develop a Confidence-regulated Generative Diffusion Model (CGDM) to efficiently generate AI agent migration decisions. Numerical results demonstrate that the CGDM algorithm significantly outperforms baseline methods in reducing system latency and enhancing robustness against cyber attacks.

ITApr 7
Near-Field Integrated Sensing, Computing and Semantic Communication in Digital Twin-Assisted Vehicular Networks

Yinchao Yang, Yahao Ding, Jiaxiang Wang et al.

Digital twin (DT) technology offers transformative potential for vehicular networks, enabling high-fidelity virtual representations for enhanced safety and automation. However, seamless DT synchronization in dynamic environments faces challenges such as massive data transmission, precision sensing, and strict computational constraints. This paper proposes an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for DT-assisted vehicular networks in the near-field (NF) regime. Leveraging a multi-user multiple-input multiple-output (MU-MIMO) configuration, each roadside unit (RSU) employs semantic communication to serve vehicles while simultaneously utilizing millimeter-wave (mmWave) radar for environmental mapping. We implement particle filtering at RSUs to achieve high-precision vehicle tracking. To optimize performance, we formulate a joint optimization problem balancing semantic communication rates and sensing accuracy under limited computational resources and power budget. Our solution includes a hybrid heuristic algorithm for vehicle-to-RSU assignment and an alternating optimization approach for determining semantic extraction ratios and beamforming matrices. Performance is extensively evaluated via the Cramér-Rao bound (CRB) for angle and distance estimation, semantic transmission rates, and resource utilization. Numerical results demonstrate that the proposed ISCSC framework achieves a 20% improvement in transmission rate while maintaining the sensing accuracy of existing integrated sensing and communication (ISAC) schemes under constrained resource conditions.

ROOct 7, 2025
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions

Wanli Ni, Hui Tian, Shuai Wang et al.

Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.

LGJul 19, 2025
Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model

Yunfeng Li, Junhong Liu, Zhaohui Yang et al.

Deep learning models have been widely adopted for False Data Injection Attack (FDIA) detection in smart grids due to their ability to capture unstructured and sparse features. However, the increasing system scale and data dimensionality introduce significant computational and memory burdens, particularly in large-scale industrial datasets, limiting detection efficiency. To address these issues, this paper proposes Rec-AD, a computationally efficient framework that integrates Tensor Train decomposition with the Deep Learning Recommendation Model (DLRM). Rec-AD enhances training and inference efficiency through embedding compression, optimized data access via index reordering, and a pipeline training mechanism that reduces memory communication overhead. Fully compatible with PyTorch, Rec-AD can be integrated into existing FDIA detection systems without code modifications. Experimental results show that Rec-AD significantly improves computational throughput and real-time detection performance, narrowing the attack window and increasing attacker cost. These advancements strengthen edge computing capabilities and scalability, providing robust technical support for smart grid security.

LGJul 10, 2025
Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization

Shaoran Yang, Dongyu Wei, Hanzhi Yu et al.

In this paper, a novel contrastive language-image pre-training (CLIP) model based semantic communication framework is designed. Compared to standard neural network (e.g.,convolutional neural network) based semantic encoders and decoders that require joint training over a common dataset, our CLIP model based method does not require any training procedures thus enabling a transmitter to extract data meanings of the original data without neural network model training, and the receiver to train a neural network for follow-up task implementation without the communications with the transmitter. Next, we investigate the deployment of the CLIP model based semantic framework over a noisy wireless network. Since the semantic information generated by the CLIP model is susceptible to wireless noise and the spectrum used for semantic information transmission is limited, it is necessary to jointly optimize CLIP model architecture and spectrum resource block (RB) allocation to maximize semantic communication performance while considering wireless noise, the delay and energy used for semantic communication. To achieve this goal, we use a proximal policy optimization (PPO) based reinforcement learning (RL) algorithm to learn how wireless noise affect the semantic communication performance thus finding optimal CLIP model and RB for each user. Simulation results show that our proposed method improves the convergence rate by up to 40%, and the accumulated reward by 4x compared to soft actor-critic.

LGMar 27, 2024
Energy-Guided Data Sampling for Traffic Prediction with Mini Training Datasets

Zhaohui Yang, Kshitij Jerath

Recent endeavors aimed at forecasting future traffic flow states through deep learning encounter various challenges and yield diverse outcomes. A notable obstacle arises from the substantial data requirements of deep learning models, a resource often scarce in traffic flow systems. Despite the abundance of domain knowledge concerning traffic flow dynamics, prevailing deep learning methodologies frequently fail to fully exploit it. To address these issues, we propose an innovative solution that merges Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) architecture to enhance the prediction of traffic flow dynamics. A key revelation of our research is the feasibility of sampling training data for large traffic systems from simulations conducted on smaller traffic systems. This insight suggests the potential for referencing a macroscopic-level distribution to inform the sampling of microscopic data. Such sampling is facilitated by the observed scale invariance in the normalized energy distribution of the statistical mechanics model, thereby streamlining the data generation process for large-scale traffic systems. Our simulations demonstrate promising agreement between predicted and actual traffic flow dynamics, underscoring the efficacy of our proposed approach.

AIOct 6, 2025
LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0

Jinbo Wen, Jiawen Kang, Linfeng Zhang et al.

Web 3.0 represents the next generation of the Internet, which is widely recognized as a decentralized ecosystem that focuses on value expression and data ownership. By leveraging blockchain and artificial intelligence technologies, Web 3.0 offers unprecedented opportunities for users to create, own, and monetize their content, thereby enabling User-Generated Content (UGC) to an entirely new level. However, some self-interested users may exploit the limitations of content curation mechanisms and generate low-quality content with less effort, obtaining platform rewards under information asymmetry. Such behavior can undermine Web 3.0 performance. To this end, we propose \textit{LMM-Incentive}, a novel Large Multimodal Model (LMM)-based incentive mechanism for UGC in Web 3.0. Specifically, we propose an LMM-based contract-theoretic model to motivate users to generate high-quality UGC, thereby mitigating the adverse selection problem from information asymmetry. To alleviate potential moral hazards after contract selection, we leverage LMM agents to evaluate UGC quality, which is the primary component of the contract, utilizing prompt engineering techniques to improve the evaluation performance of LMM agents. Recognizing that traditional contract design methods cannot effectively adapt to the dynamic environment of Web 3.0, we develop an improved Mixture of Experts (MoE)-based Proximal Policy Optimization (PPO) algorithm for optimal contract design. Simulation results demonstrate the superiority of the proposed MoE-based PPO algorithm over representative benchmarks in the context of contract design. Finally, we deploy the designed contract within an Ethereum smart contract framework, further validating the effectiveness of the proposed scheme.

LGJul 20, 2025
Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data

Yunfeng Li, Junhong Liu, Zhaohui Yang et al.

False Data Injection Attacks (FDIAs) pose severe security risks to smart grids by manipulating measurement data collected from spatially distributed devices such as SCADA systems and PMUs. These measurements typically exhibit Non-Independent and Identically Distributed (Non-IID) characteristics across different regions, which significantly challenges the generalization ability of detection models. Traditional centralized training approaches not only face privacy risks and data sharing constraints but also incur high transmission costs, limiting their scalability and deployment feasibility. To address these issues, this paper proposes a privacy-preserving federated learning framework, termed Federated Cluster Average (FedClusAvg), designed to improve FDIA detection in Non-IID and resource-constrained environments. FedClusAvg incorporates cluster-based stratified sampling and hierarchical communication (client-subserver-server) to enhance model generalization and reduce communication overhead. By enabling localized training and weighted parameter aggregation, the algorithm achieves accurate model convergence without centralizing sensitive data. Experimental results on benchmark smart grid datasets demonstrate that FedClusAvg not only improves detection accuracy under heterogeneous data distributions but also significantly reduces communication rounds and bandwidth consumption. This work provides an effective solution for secure and efficient FDIA detection in large-scale distributed power systems.

ITJun 30, 2025
Bridging Physical and Digital Worlds: Embodied Large AI for Future Wireless Systems

Xinquan Wang, Fenghao Zhu, Zhaohui Yang et al.

Large artificial intelligence (AI) models offer revolutionary potential for future wireless systems, promising unprecedented capabilities in network optimization and performance. However, current paradigms largely overlook crucial physical interactions. This oversight means they primarily rely on offline datasets, leading to difficulties in handling real-time wireless dynamics and non-stationary environments. Furthermore, these models often lack the capability for active environmental probing. This paper proposes a fundamental paradigm shift towards wireless embodied large AI (WELAI), moving from passive observation to active embodiment. We first identify key challenges faced by existing models, then we explore the design principles and system structure of WELAI. Besides, we outline prospective applications in next-generation wireless. Finally, through an illustrative case study, we demonstrate the effectiveness of WELAI and point out promising research directions for realizing adaptive, robust, and autonomous wireless systems.

CVJun 15, 2025
Semantic-Aware Visual Information Transmission With Key Information Extraction Over Wireless Networks

Chen Zhu, Kang Liang, Jianrong Bao et al.

The advent of 6G networks demands unprecedented levels of intelligence, adaptability, and efficiency to address challenges such as ultra-high-speed data transmission, ultra-low latency, and massive connectivity in dynamic environments. Traditional wireless image transmission frameworks, reliant on static configurations and isolated source-channel coding, struggle to balance computational efficiency, robustness, and quality under fluctuating channel conditions. To bridge this gap, this paper proposes an AI-native deep joint source-channel coding (JSCC) framework tailored for resource-constrained 6G networks. Our approach integrates key information extraction and adaptive background synthesis to enable intelligent, semantic-aware transmission. Leveraging AI-driven tools, Mediapipe for human pose detection and Rembg for background removal, the model dynamically isolates foreground features and matches backgrounds from a pre-trained library, reducing data payloads while preserving visual fidelity. Experimental results demonstrate significant improvements in peak signal-to-noise ratio (PSNR) compared with traditional JSCC method, especially under low-SNR conditions. This approach offers a practical solution for multimedia services in resource-constrained mobile communications.

SPMay 19, 2025
Multi-View Wireless Sensing via Conditional Generative Learning: Framework and Model Design

Ziqing Xing, Zhaoyang Zhang, Zirui Chen et al.

In this paper, we incorporate physical knowledge into learning-based high-precision target sensing using the multi-view channel state information (CSI) between multiple base stations (BSs) and user equipment (UEs). Such kind of multi-view sensing problem can be naturally cast into a conditional generation framework. To this end, we design a bipartite neural network architecture, the first part of which uses an elaborately designed encoder to fuse the latent target features embedded in the multi-view CSI, and then the second uses them as conditioning inputs of a powerful generative model to guide the target's reconstruction. Specifically, the encoder is designed to capture the physical correlation between the CSI and the target, and also be adaptive to the numbers and positions of BS-UE pairs. Therein the view-specific nature of CSI is assimilated by introducing a spatial positional embedding scheme, which exploits the structure of electromagnetic(EM)-wave propagation channels. Finally, a conditional diffusion model with a weighted loss is employed to generate the target's point cloud from the fused features. Extensive numerical results demonstrate that the proposed generative multi-view (Gen-MV) sensing framework exhibits excellent flexibility and significant performance improvement on the reconstruction quality of target's shape and EM properties.

LGApr 20, 2025
Efficient Split Federated Learning for Large Language Models over Communication Networks

Kai Zhao, Zhaohui Yang, Ye Hu et al.

Fine-tuning pre-trained large language models (LLMs) in a distributed manner poses significant challenges on resource-constrained edge networks. To address this challenge, we propose SflLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques. By leveraging model splitting and low-rank adaptation (LoRA), SflLLM reduces the computational burden on edge devices. Furthermore, the introduction of a federated server facilitates parallel training and enhances data privacy. To accommodate heterogeneous communication conditions and diverse computational capabilities of edge devices, as well as the impact of LoRA rank selection on model convergence and training cost, we formulate a joint optimization problem of both communication and computation resource. The formulated problem jointly optimizes subchannel allocation, power control, model splitting point selection, and LoRA rank configuration, aimed at minimizing total training delay. An iterative optimization algorithm is proposed to solve this problem efficiently. Specifically, a greedy heuristic is employed for subchannel allocation, the power control subproblem is reformulated as a convex optimization problem using auxiliary variables, and an exhaustive search is adopted for optimal split position and rank selection. Simulation results demonstrate that the proposed SflLLM framework achieves comparable model accuracy while significantly reducing client-side computational requirements. Furthermore, the proposed resource allocation scheme and adaptive LoRA rank selection strategy notably reduce the training latency compared to conventional approaches.

LGJun 5, 2024
Near-field Beam training for Extremely Large-scale MIMO Based on Deep Learning

Jiali Nie, Yuanhao Cui, Zhaohui Yang et al.

Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. As ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. The near-field beam training in ELAA requires both angle and distance information, which inevitably leads to a significant increase in the beam training overhead. To address this problem, we propose a near-field beam training method based on deep learning. We use a convolutional neural network (CNN) to efficiently learn channel characteristics from historical data by strategically selecting padding and kernel sizes. The negative value of the user average achievable rate is utilized as the loss function to optimize the beamformer. This method maximizes multi-user networks' achievable rate without predefined beam codebooks. Upon deployment, the model requires solely the pre-estimated channel state information (CSI) to derive the optimal beamforming vector. The simulation results demonstrate that the proposed scheme achieves a more stable beamforming gain and significantly improves performance compared to the traditional beam training method. Furthermore, owing to the inherent traits of deep learning methodologies, this approach substantially diminishes the near-field beam training overhead.