Ruichen Zhang

NI
h-index116
43papers
455citations
Novelty43%
AI Score54

43 Papers

LGMay 27
Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

Mufan Qiu, Genhui Zheng, Yinuo Xu et al.

Predicting how a cell will change its transcriptional state under a developmental signal or a genetic perturbation is the computational core of in-silico biology and the AI Virtual Cell program. Existing approaches either fit static control-to-treated maps that discard time, or solve multi-step ODE / Schrödinger-bridge problems on each dataset independently. We introduce Chreode, a one-step cell world model that predicts action-conditioned cell-state transitions through a structured residual transition operator. It shifts distributional evolution from inference time to training time, enabling single-pass generation while preserving a Waddington-inspired decomposition into downhill landscape flow, rotational in-tangent dynamics, and stochastic spread. The model is pretrained with a shared scVI encoder and a DiT-based dynamics backbone on a 2.4M-cell mouse embryonic atlas spanning 7 datasets. As a fine-tuning initialization, Chreode improves per-target Sinkhorn distance on Weinreb hematopoiesis and Veres islet differentiation over matched scratch models, PI-SDE, and PRESCIENT. As a transferable gene-state embedding for GEARS, the pretrained dynamics representation reduces shared-vocabulary DE20 mean squared error on Norman Perturb-seq from 0.2121 to 0.1858, a 12.4% relative improvement, without changing the GEARS training procedure. We interpret this transfer to perturbation prediction as evidence that pretrained developmental-trajectory dynamics encode differentiation primitives transferable to CRISPR-induced state shifts, since both involve cell-state transitions in a shared latent geometry. The pretrained backbone additionally produces zero-shot clonal fate scores on Weinreb that are competitive with strong dynamic-OT baselines.

NIJan 15
Large Language Model (LLM)-enabled Reinforcement Learning for Wireless Network Optimization

Jie Zheng, Ruichen Zhang, Dusit Niyato et al.

Enhancing future wireless networks presents a significant challenge for networking systems due to diverse user demands and the emergence of 6G technology. While reinforcement learning (RL) is a powerful framework, it often encounters difficulties with high-dimensional state spaces and complex environments, leading to substantial computational demands, distributed intelligence, and potentially inconsistent outcomes. Large language models (LLMs), with their extensive pretrained knowledge and advanced reasoning capabilities, offer promising tools to enhance RL in optimizing 6G wireless networks. We explore RL models augmented by LLMs, emphasizing their roles and the potential benefits of their synergy in wireless network optimization. We then examine LLM-enabled RL across various protocol layers: physical, data link, network, transport, and application layers. Additionally, we propose an LLM-assisted state representation and semantic extraction to enhance the multi-agent reinforcement learning (MARL) framework. This approach is applied to service migration and request routing, as well as topology graph generation in unmanned aerial vehicle (UAV)-satellite networks. Through case studies, we demonstrate that our framework effectively performs optimization of wireless network. Finally, we outline prospective research directions for LLM-enabled RL in wireless network optimization.

CVNov 3, 2025
SecDiff: Diffusion-Aided Secure Deep Joint Source-Channel Coding Against Adversarial Attacks

Changyuan Zhao, Jiacheng Wang, Ruichen Zhang et al.

Deep joint source-channel coding (JSCC) has emerged as a promising paradigm for semantic communication, delivering significant performance gains over conventional separate coding schemes. However, existing JSCC frameworks remain vulnerable to physical-layer adversarial threats, such as pilot spoofing and subcarrier jamming, compromising semantic fidelity. In this paper, we propose SecDiff, a plug-and-play, diffusion-aided decoding framework that significantly enhances the security and robustness of deep JSCC under adversarial wireless environments. Different from prior diffusion-guided JSCC methods that suffer from high inference latency, SecDiff employs pseudoinverse-guided sampling and adaptive guidance weighting, enabling flexible step-size control and efficient semantic reconstruction. To counter jamming attacks, we introduce a power-based subcarrier masking strategy and recast recovery as a masked inpainting problem, solved via diffusion guidance. For pilot spoofing, we formulate channel estimation as a blind inverse problem and develop an expectation-minimization (EM)-driven reconstruction algorithm, guided jointly by reconstruction loss and a channel operator. Notably, our method alternates between pilot recovery and channel estimation, enabling joint refinement of both variables throughout the diffusion process. Extensive experiments over orthogonal frequency-division multiplexing (OFDM) channels under adversarial conditions show that SecDiff outperforms existing secure and generative JSCC baselines by achieving a favorable trade-off between reconstruction quality and computational cost. This balance makes SecDiff a promising step toward practical, low-latency, and attack-resilient semantic communications.

CRMay 25
Efficient and Privacy-Preserving Distribution Statistics Analytics on Mobile Spatial Data

Xuhao Ren, Mingyang Zhao, Ruichen Zhang et al.

With the rapid development of mobile computing technology, massive amounts of spatial data are continuously generated from various mobile terminals and sensing devices, such as smartphones, connected vehicles, and drones. Performing efficient distributed statistical analysis on this data is crucial for real-time mobile computing applications. However, the constrained and dynamic nature of mobile environments exacerbates the privacy challenge: centralizing sensitive data for analysis risks severe privacy leaks, while existing privacy-preserving techniques often introduce excessive overhead or inaccuracies In this paper, we design, implement, and evaluate the first system that supports efficient and privacy-preserving distribution statistics analysis for mobile spatial data. First, we propose eSpat-B, which leverages two non-colluding servers and a newly designed improved distributed point functions (DPF) with octree partitioning. Furthermore, considering the frequent updates of spatial data, we propose another more efficient scheme, eSpat+. The core idea of this scheme is to utilize a K-Dimensional tree for spatial partitioning, combine it with incremental DPF for performing statistics analysis, and design an efficient update algorithm. Security analysis demonstrates that our schemes effectively protect data privacy throughout the statistical process. Theoretical analysis and experimental results on real-world mobile trajectory datasets demonstrate that our proposed schemes achieve a reduction of approximately 1.2* in computation overhead, 20* in communication overhead, and maintain 100% accuracy.

NIMar 12
Efficient Cross-View Localization in 6G Space-Air-Ground Integrated Network

Min Hao, Yanbing Xu, Maoqiang Wu et al.

Recently, visual localization has become an important supplement to improve localization reliability, and cross-view approaches can greatly enhance coverage and adaptability. Meanwhile, future 6G will enable a globally covered mobile communication system, with a space-air-ground integrated network (SAGIN) serving as key supporting architecture. Inspired by this, we explore an integration of cross-view localization (CVL) with 6G SAGIN, thereby enhancing its performance in latency, energy consumption, and privacy protection. First, we provide a comprehensive review of CVL and SAGIN, highlighting their capabilities, integration opportunities, and potential applications. Benefiting from the fast and extensive image collection and transmission capabilities of the 6G SAGIN architecture, CVL achieves higher localization accuracy and faster processing speed. Then, we propose a split-inference framework for implementing CVL, which fully leverages the distributed communication and computing resources of the 6G SAGIN architecture. Subsequently, we conduct joint optimization of communication, computation, and confidentiality within the proposed split-inference framework, aiming to provide a paradigm and a direction for making CVL efficient. Experimental results validate the effectiveness of the proposed framework and provide solutions to the optimization problem. Finally, we discuss potential research directions for 6G SAGIN-enabled CVL.

CVMar 13
Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks

Yunting Xu, Jiacheng Wang, Ruichen Zhang et al.

Multi-uncrewed aerial vehicle (UAV) cooperative perception has emerged as a promising paradigm for diverse low-altitude economy applications, where complementary multi-view observations are leveraged to enhance perception performance via wireless communications. However, the massive visual data generated by multiple UAVs poses significant challenges in terms of communication latency and resource efficiency. To address these challenges, this paper proposes a communication-efficient cooperative perception framework, termed Base-Station-Helped UAV (BHU), which reduces communication overhead while enhancing perception performance. Specifically, we employ a Top-K selection mechanism to identify the most informative pixels from UAV-captured RGB images, enabling sparsified visual transmission with reduced data volume and latency. The sparsified images are transmitted to a ground server via multi-user MIMO (MU-MIMO), where a Swin-large-based MaskDINO encoder extracts bird's-eye-view (BEV) features and performs cooperative feature fusion for ground vehicle perception. Furthermore, we develop a diffusion model-based deep reinforcement learning (DRL) algorithm to jointly select cooperative UAVs, sparsification ratios, and precoding matrices, achieving a balance between communication efficiency and perception utility. Simulation results on the Air-Co-Pred dataset demonstrate that, compared with traditional CNN-based BEV fusion baselines, the proposed BHU framework improves perception performance by over 5% while reducing communication overhead by 85%, providing an effective solution for multi-UAV cooperative perception under resource-constrained wireless environments.

NIMar 2
Federated Agentic AI for Wireless Networks: Fundamentals, Approaches, and Applications

Lingyi Cai, Yu Zhang, Ruichen Zhang et al.

Agentic artificial intelligence (AI) presents a promising pathway toward realizing autonomous and self-improving wireless network services. However, resource-constrained, widely distributed, and data-heterogeneous nature of wireless networks poses significant challenges to existing agentic AI that relies on centralized architectures, leading to high communication overhead, privacy risks, and non-independent and identically distributed (non-IID) data. Federated learning (FL) has the potential to improve the overall loop of agentic AI through collaborative local learning and parameter sharing without exchanging raw data. This paper proposes new federated agentic AI approaches for wireless networks. We first summarize fundamentals of agentic AI and mainstream FL types. Then, we illustrate how each FL type can strengthen a specific component of agentic AI's loop. Moreover, we conduct a case study on using FRL to improve the performance of agentic AI's action decision in low-altitude wireless networks (LAWNs). Finally, we provide a conclusion and discuss future research directions.

CRMar 12
Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Zikang Ding, Haomiao Yang, Meng Hao et al.

Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.

PLASM-PHApr 7
Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations

Ruichen Zhang, Feda AlMuhisen, Chenguang Wan et al.

Accurate parameter selection is fundamental to gyrokinetic plasma simulations, yet current practices rely heavily on manual literature reviews, leading to inefficiencies and inconsistencies. We introduce Plasma GraphRAG, a novel framework that integrates Graph Retrieval-Augmented Generation (GraphRAG) with large language models (LLMs) for automated, physics-grounded parameter range identification. By constructing a domain-specific knowledge graph from curated plasma literature and enabling structured retrieval over graph-anchored entities and relations, Plasma GraphRAG enables LLMs to generate accurate, context-aware recommendations. Extensive evaluations across five metrics, comprehensiveness, diversity, grounding, hallucination, and empowerment, demonstrate that Plasma GraphRAG outperforms vanilla RAG by over $10\%$ in overall quality and reduces hallucination rates by up to $25\%$. {Beyond enhancing simulation reliability, Plasma GraphRAG offers a methodology for accelerating scientific discovery across complex, data-rich domains.

AIMay 24, 2025Code
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation

Ruichen Zhang, Rana Muhammad Shahroz Khan, Zhen Tan et al.

Data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities. However, there still lacks a comprehensive benchmark to systematically assess the effect of each distillation approach. This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation from method, model and data perspectives. Utilizing various teacher models (e.g., o4-mini, Gemini-Pro, Claude-3.5) and student architectures (e.g., 3B, 7B parameters), we rigorously evaluate the impact of these data manipulations on student model performance across multiple reasoning datasets, with a focus on in-distribution (IID) and out-of-distribution (OOD) generalization, and cross-domain transfer. Our findings aim to provide actionable insights and establish best practices for optimizing CoT distillation through data-centric techniques, ultimately facilitating the development of more accessible and capable reasoning models. The dataset can be found at https://huggingface.co/datasets/rana-shahroz/DC-COT, while our code is shared in https://anonymous.4open.science/r/DC-COT-FF4C/.

MAMay 11
Skill Description Deception Attack against Task Routing in Internet of Agents

Jiayi He, Xiaofeng Luo, Jiawen Kang et al.

A new paradigm, Internet of Agents (IoA), is transforming networked systems into LLM-driven service networks, where heterogeneous agents collaborate through task routing based on their self-declared skill descriptions. Although this promising paradigm enables agentic, distributed, and advanced intelligence, it also exposes a new and overlooked attack surface. In particular, malicious agents can strategically manipulate their skill descriptions to bias routing decisions and increase their probability of being selected for task execution, thereby disrupting user tasks and degrading system reliability. To characterize this threat, we propose and formalize a new attack model, termed \emph{Skill Description Deception} (SDD) attack. We further design an LLM-enabled SDD attack framework that automatically generates deceptive skill descriptions, enabling systematic vulnerability assessment of IoA systems. Experimental results on nine representative domains show that the proposed attack can achieve up to 98\% attack success rate, demonstrating the severity and generality of the attack. Our paper reveals a new security vulnerability in IoA and calls for secure and trustworthy semantic routing mechanisms for future IoA systems.

CLMay 9
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Guijin Son, Seungone Kim, Catherine Arnett et al.

Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning. Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative. Yet research-level math benchmarks remain scarce because such problems are difficult to source (e.g., Riemann Bench and FrontierMath-Tier 4 contain 25 and 50 problems, respectively). To support reliable evaluation of next-generation frontier models, we introduce Soohak, a 439-problem benchmark newly authored from scratch by 64 mathematicians. Soohak comprises two subsets. On the Challenge subset, frontier models including Gemini-3-Pro, GPT-5, and Claude-Opus-4.5 reach 30.4%, 26.4%, and 10.4% respectively, leaving substantial headroom, while leading open-weight models such as Qwen3-235B, GPT-OSS-120B, and Kimi-2.5 remain below 15%. Notably, beyond standard problem solving, Soohak introduces a refusal subset that probes a capability intrinsic to research mathematics: recognizing ill-posed problems and pausing rather than producing confident but unjustified answers. On this subset, no model exceeds 50%, identifying refusal as a new optimization target that current models do not directly address. To prevent contamination, the dataset will be publicly released in late 2026, with model evaluations available upon request in the interim.

CVOct 11, 2025Code
EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection

Huaizhi Qu, Ruichen Zhang, Shuqing Luo et al.

Recent advances in foundation models have driven remarkable progress in image editing, yet their extension to 3D editing remains underexplored. A natural approach is to replace the image editing modules in existing workflows with foundation models. However, their heavy computational demands and the restrictions and costs of closed-source APIs make plugging these models into existing iterative editing strategies impractical. To address this limitation, we propose EditCast3D, a pipeline that employs video generation foundation models to propagate edits from a single first frame across the entire dataset prior to reconstruction. While editing propagation enables dataset-level editing via video models, its consistency remains suboptimal for 3D reconstruction, where multi-view alignment is essential. To overcome this, EditCast3D introduces a view selection strategy that explicitly identifies consistent and reconstruction-friendly views and adopts feedforward reconstruction without requiring costly refinement. In combination, the pipeline both minimizes reliance on expensive image editing and mitigates prompt ambiguities that arise when applying foundation models independently across images. We evaluate EditCast3D on commonly used 3D editing datasets and compare it against state-of-the-art 3D editing baselines, demonstrating superior editing quality and high efficiency. These results establish EditCast3D as a scalable and general paradigm for integrating foundation models into 3D editing pipelines. The code is available at https://github.com/UNITES-Lab/EditCast3D

CVJul 17, 2025Code
SparseC-AFM: a deep learning method for fast and accurate characterization of MoS$_2$ with C-AFM

Levi Harris, Md Jayed Hossain, Mufan Qiu et al.

The increasing use of two-dimensional (2D) materials in nanoelectronics demands robust metrology techniques for electrical characterization, especially for large-scale production. While atomic force microscopy (AFM) techniques like conductive AFM (C-AFM) offer high accuracy, they suffer from slow data acquisition speeds due to the raster scanning process. To address this, we introduce SparseC-AFM, a deep learning model that rapidly and accurately reconstructs conductivity maps of 2D materials like MoS$_2$ from sparse C-AFM scans. Our approach is robust across various scanning modes, substrates, and experimental conditions. We report a comparison between (a) classic flow implementation, where a high pixel density C-AFM image (e.g., 15 minutes to collect) is manually parsed to extract relevant material parameters, and (b) our SparseC-AFM method, which achieves the same operation using data that requires substantially less acquisition time (e.g., under 5 minutes). SparseC-AFM enables efficient extraction of critical material parameters in MoS$_2$, including film coverage, defect density, and identification of crystalline island boundaries, edges, and cracks. We achieve over 11x reduction in acquisition time compared to manual extraction from a full-resolution C-AFM image. Moreover, we demonstrate that our model-predicted samples exhibit remarkably similar electrical properties to full-resolution data gathered using classic-flow scanning. This work represents a significant step toward translating AI-assisted 2D material characterization from laboratory research to industrial fabrication. Code and model weights are available at github.com/UNITES-Lab/sparse-cafm.

NIApr 14, 2024
Generative AI Agents with Large Language Model for Satellite Networks via a Mixture of Experts Transmission

Ruichen Zhang, Hongyang Du, Yinqiu Liu et al.

In response to the needs of 6G global communications, satellite communication networks have emerged as a key solution. However, the large-scale development of satellite communication networks is constrained by the complex system models, whose modeling is challenging for massive users. Moreover, transmission interference between satellites and users seriously affects communication performance. To solve these problems, this paper develops generative artificial intelligence (AI) agents for model formulation and then applies a mixture of experts (MoE) approach to design transmission strategies. Specifically, we leverage large language models (LLMs) to build an interactive modeling paradigm and utilize retrieval-augmented generation (RAG) to extract satellite expert knowledge that supports mathematical modeling. Afterward, by integrating the expertise of multiple specialized components, we propose an MoE-proximal policy optimization (PPO) approach to solve the formulated problem. Each expert can optimize the optimization variables at which it excels through specialized training through its own network and then aggregates them through the gating network to perform joint optimization. The simulation results validate the accuracy and effectiveness of employing a generative agent for problem formulation. Furthermore, the superiority of the proposed MoE-ppo approach over other benchmarks is confirmed in solving the formulated problem. The adaptability of MoE-PPO to various customized modeling problems has also been demonstrated.

AIDec 17, 2025
Agentic AI for Integrated Sensing and Communication: Analysis, Framework, and Case Study

Wenwen Xie, Geng Sun, Ruichen Zhang et al.

Integrated sensing and communication (ISAC) has emerged as a key development direction in the sixth-generation (6G) era, which provides essential support for the collaborative sensing and communication of future intelligent networks. However, as wireless environments become increasingly dynamic and complex, ISAC systems require more intelligent processing and more autonomous operation to maintain efficiency and adaptability. Meanwhile, agentic artificial intelligence (AI) offers a feasible solution to address these challenges by enabling continuous perception-reasoning-action loops in dynamic environments to support intelligent, autonomous, and efficient operation for ISAC systems. As such, we delve into the application value and prospects of agentic AI in ISAC systems in this work. Firstly, we provide a comprehensive review of agentic AI and ISAC systems to demonstrate their key characteristics. Secondly, we show several common optimization approaches for ISAC systems and highlight the significant advantages of generative artificial intelligence (GenAI)-based agentic AI. Thirdly, we propose a novel agentic ISAC framework and prensent a case study to verify its superiority in optimizing ISAC performance. Finally, we clarify future research directions for agentic AI-based ISAC systems.

NIFeb 24, 2025
Toward Agentic AI: Generative Information Retrieval Inspired Intelligent Communications and Networking

Ruichen Zhang, Shunpu Tang, Yinqiu Liu et al.

The increasing complexity and scale of modern telecommunications networks demand intelligent automation to enhance efficiency, adaptability, and resilience. Agentic AI has emerged as a key paradigm for intelligent communications and networking, enabling AI-driven agents to perceive, reason, decide, and act within dynamic networking environments. However, effective decision-making in telecom applications, such as network planning, management, and resource allocation, requires integrating retrieval mechanisms that support multi-hop reasoning, historical cross-referencing, and compliance with evolving 3GPP standards. This article presents a forward-looking perspective on generative information retrieval-inspired intelligent communications and networking, emphasizing the role of knowledge acquisition, processing, and retrieval in agentic AI for telecom systems. We first provide a comprehensive review of generative information retrieval strategies, including traditional retrieval, hybrid retrieval, semantic retrieval, knowledge-based retrieval, and agentic contextual retrieval. We then analyze their advantages, limitations, and suitability for various networking scenarios. Next, we present a survey about their applications in communications and networking. Additionally, we introduce an agentic contextual retrieval framework to enhance telecom-specific planning by integrating multi-source retrieval, structured reasoning, and self-reflective validation. Experimental results demonstrate that our framework significantly improves answer accuracy, explanation consistency, and retrieval efficiency compared to traditional and semantic retrieval methods. Finally, we outline future research directions.

NIApr 28, 2024
Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models

Jinbo Wen, Ruichen Zhang, Dusit Niyato et al.

By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excellent reasoning and generation capabilities. In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT. Specifically, we first study the main impacts that cause carbon emissions in AIoT, and then introduce GAI techniques and their relations to carbon emissions. We then explore the application prospects of GAI in low-carbon AIoT, focusing on how GAI can reduce carbon emissions of network components. Subsequently, we propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules to generate more accurate and reliable optimization problems. Furthermore, we utilize Generative Diffusion Models (GDMs) to identify optimal strategies for carbon emission reduction. Numerical results demonstrate the effectiveness of the proposed framework. Finally, we insightfully provide open research directions for low-carbon AIoT.

MAFeb 11, 2025
Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs

Ruichen Zhang, Mufan Qiu, Zhen Tan et al.

Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a "symbiotic improvement" for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs-owing to their distinct reasoning capabilities-often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a Hybrid Mode for Privacy Preservation to address user privacy concerns. Evaluated on the WEBARENA benchmark, AgentSymbiotic achieves SOTA performance with both LLM types. Our best Large LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model demonstrates a competitive 49%, exceeding the prior best of 28%. Code will be released upon acceptance.

AIMay 31, 2025
World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks

Changyuan Zhao, Ruichen Zhang, Jiacheng Wang et al.

World models are emerging as a transformative paradigm in artificial intelligence, enabling agents to construct internal representations of their environments for predictive reasoning, planning, and decision-making. By learning latent dynamics, world models provide a sample-efficient framework that is especially valuable in data-constrained or safety-critical scenarios. In this paper, we present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning. We compare and distinguish world models from related concepts such as digital twins, the metaverse, and foundation models, clarifying their unique role as embedded cognitive engines for autonomous agents. We further propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization, particularly in low-altitude wireless networks (LAWNs). Through a weather-aware UAV trajectory planning case study, we demonstrate the effectiveness of our framework in improving learning efficiency and decision quality.

ITNov 27, 2024
Generative Semantic Communication for Joint Image Transmission and Segmentation

Weiwen Yuan, Jinke Ren, Chongjie Wang et al.

Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results show that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.

AIMay 27, 2025
Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking

Lingyi Cai, Ruichen Zhang, Changyuan Zhao et al.

Low-Altitude Economic Networking (LAENet) aims to support diverse flying applications below 1,000 meters by deploying various aerial vehicles for flexible and cost-effective aerial networking. However, complex decision-making, resource constraints, and environmental uncertainty pose significant challenges to the development of the LAENet. Reinforcement learning (RL) offers a potential solution in response to these challenges but has limitations in generalization, reward design, and model stability. The emergence of large language models (LLMs) offers new opportunities for RL to mitigate these limitations. In this paper, we first present a tutorial about integrating LLMs into RL by using the capacities of generation, contextual understanding, and structured reasoning of LLMs. We then propose an LLM-enhanced RL framework for the LAENet in terms of serving the LLM as information processor, reward designer, decision-maker, and generator. Moreover, we conduct a case study by using LLMs to design a reward function to improve the learning performance of RL in the LAENet. Finally, we provide a conclusion and discuss future work.

NIMay 18, 2025
LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage Approach

Yinqiu Liu, Guangyuan Liu, Jiacheng Wang et al.

Nowadays, Generative AI (GenAI) reshapes numerous domains by enabling machines to create content across modalities. As GenAI evolves into autonomous agents capable of reasoning, collaboration, and interaction, they are increasingly deployed on network infrastructures to serve humans automatically. This emerging paradigm, known as the agentic network, presents new optimization challenges due to the demand to incorporate subjective intents of human users expressed in natural language. Traditional generic Deep Reinforcement Learning (DRL) struggles to capture intent semantics and adjust policies dynamically, thus leading to suboptimality. In this paper, we present LAMeTA, a Large AI Model (LAM)-empowered Two-stage Approach for intent-aware agentic network optimization. First, we propose Intent-oriented Knowledge Distillation (IoKD), which efficiently distills intent-understanding capabilities from resource-intensive LAMs to lightweight edge LAMs (E-LAMs) to serve end users. Second, we develop Symbiotic Reinforcement Learning (SRL), integrating E-LAMs with a policy-based DRL framework. In SRL, E-LAMs translate natural language user intents into structured preference vectors that guide both state representation and reward design. The DRL, in turn, optimizes the generative service function chain composition and E-LAM selection based on real-time network conditions, thus optimizing the subjective Quality-of-Experience (QoE). Extensive experiments conducted in an agentic network with 81 agents demonstrate that IoKD reduces mean squared error in intent prediction by up to 22.5%, while SRL outperforms conventional generic DRL by up to 23.5% in maximizing intent-aware QoE.

LGJan 16, 2025
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework

Yushen Lin, Ruichen Zhang, Wenqi Huang et al.

In this work, we develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) specifically for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. By utilizing advanced language models for entity extraction and question generation, rigorous data curation processes are employed to maintain high quality and relevance. Additionally, we introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data with 2.24\% and 1.31\% performance boost for different models compared to baselines, respectively. To demonstrate the effectiveness of the fine-tuned models with the proposed methodologies on practical tasks, we also consider different tasks, including summarizing optimization problems from technical papers and solving the mathematical problems related to non-orthogonal multiple access (NOMA), which are generated by using the proposed multi-agent framework. Simulation results show significant performance gain in summarization tasks with 20.9\% in the ROUGE-L metrics. We also study the scaling laws of fine-tuning LLMs and the challenges LLMs face in the field of wireless communications, offering insights into their adaptation to wireless communication tasks. This dataset and fine-tuning methodology aim to enhance the training and evaluation of LLMs, contributing to advancements in LLMs for wireless communication research and applications.

NIApr 13, 2024
Large Language Model Empowered Next-Generation MIMO Networks: Fundamentals, Challenges, and Visions

Zhe Wang, Jiayi Zhang, Hongyang Du et al.

Next-generation Multiple-Input Multiple-Output (MIMO) is expected to be intelligent and scalable. In this paper, we study Large Language Model (LLM)-enabled next-generation MIMO networks. Firstly, we provide an overview of the development, fundamentals, and challenges of the next-generation MIMO. Then, we propose the concept of the generative AI agent, which is capable of generating tailored and specialized contents with the aid of LLM and Retrieval Augmented Generation (RAG). Next, we comprehensively discuss the features and advantages of the generative AI agent framework. More importantly, to tackle existing challenges of next-generation MIMO, we discuss generative AI agent-enabled next-generation MIMO networks from the perspective of performance analysis, signal processing, and resource allocation. Furthermore, we present two compelling case studies that demonstrate the effectiveness of leveraging the generative AI agent for performance analysis in complex configuration scenarios. These examples highlight how the integration of generative AI agents can significantly enhance the analysis and design of next-generation MIMO systems. Finally, we discuss important potential research future directions.

LGAug 13, 2025
Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges

Changyuan Zhao, Guangyuan Liu, Ruichen Zhang et al.

Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously across diverse, dynamic environments. Central to this vision are world models, which act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi-step actions with foresight. This proactive nature allows agents to anticipate potential outcomes and optimize decisions ahead of real-world interactions. While prior works in robotics and gaming have showcased the potential of world models, their integration into the wireless edge for EGI remains underexplored. This survey bridges this gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge. We first examine the architectural foundations of world models, including latent representation learning, dynamics modeling, and imagination-based planning. Building on these core capabilities, we illustrate their proactive applications across EGI scenarios such as vehicular networks, unmanned aerial vehicle (UAV) networks, the Internet of Things (IoT) systems, and network functions virtualization, thereby highlighting how they can enhance optimization under latency, energy, and privacy constraints. We then explore their synergy with foundation models and digital twins, positioning world models as the cognitive backbone of EGI. Finally, we highlight open challenges, such as safety guarantees, efficient training, and constrained deployment, and outline future research directions. This survey provides both a conceptual foundation and a practical roadmap for realizing the next generation of intelligent, autonomous edge systems.

NIJul 19, 2025
Agentic Satellite-Augmented Low-Altitude Economy and Terrestrial Networks: A Survey on Generative Approaches

Xiaozheng Gao, Yichen Wang, Bosen Liu et al.

The development of satellite-augmented low-altitude economy and terrestrial networks (SLAETNs) demands intelligent and autonomous systems that can operate reliably across heterogeneous, dynamic, and mission-critical environments. To address these challenges, this survey focuses on enabling agentic artificial intelligence (AI), that is, artificial agents capable of perceiving, reasoning, and acting, through generative AI (GAI) and large language models (LLMs). We begin by introducing the architecture and characteristics of SLAETNs, and analyzing the challenges that arise in integrating satellite, aerial, and terrestrial components. Then, we present a model-driven foundation by systematically reviewing five major categories of generative models: variational autoencoders (VAEs), generative adversarial networks (GANs), generative diffusion models (GDMs), transformer-based models (TBMs), and LLMs. Moreover, we provide a comparative analysis to highlight their generative mechanisms, capabilities, and deployment trade-offs within SLAETNs. Building on this foundation, we examine how these models empower agentic functions across three domains: communication enhancement, security and privacy protection, and intelligent satellite tasks. Finally, we outline key future directions for building scalable, adaptive, and trustworthy generative agents in SLAETNs. This survey aims to provide a unified understanding and actionable reference for advancing agentic AI in next-generation integrated networks.

LGMar 13, 2025
Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions

Jiani Fan, Lwin Khin Shar, Ruichen Zhang et al.

Money laundering is a financial crime that obscures the origin of illicit funds, necessitating the development and enforcement of anti-money laundering (AML) policies by governments and organizations. The proliferation of mobile payment platforms and smart IoT devices has significantly complicated AML investigations. As payment networks become more interconnected, there is an increasing need for efficient real-time detection to process large volumes of transaction data on heterogeneous payment systems by different operators such as digital currencies, cryptocurrencies and account-based payments. Most of these mobile payment networks are supported by connected devices, many of which are considered loT devices in the FinTech space that constantly generate data. Furthermore, the growing complexity and unpredictability of transaction patterns across these networks contribute to a higher incidence of false positives. While machine learning solutions have the potential to enhance detection efficiency, their application in AML faces unique challenges, such as addressing privacy concerns tied to sensitive financial data and managing the real-world constraint of limited data availability due to data regulations. Existing surveys in the AML literature broadly review machine learning approaches for money laundering detection, but they often lack an in-depth exploration of advanced deep learning techniques - an emerging field with significant potential. To address this gap, this paper conducts a comprehensive review of deep learning solutions and the challenges associated with their use in AML. Additionally, we propose a novel framework that applies the least-privilege principle by integrating machine learning techniques, codifying AML red flags, and employing account profiling to provide context for predictions and enable effective fraud detection under limited data availability....

NIFeb 17, 2025
Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning

Yinqiu Liu, Ruichen Zhang, Jiacheng Wang et al.

Due to massive computational demands of large generative models, AI-Generated Content (AIGC) can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content generation for resource-constrained users. However, such a paradigm faces two significant challenges: 1) raw prompts (i.e., the task description from users) often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources given the heterogeneity of AIGC tasks. To address these challenges, we propose an intelligent mobile AIGC service scheme. Firstly, we develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation through small-scale expert demonstrations. Secondly, we formulate a dynamic mobile AIGC service provisioning problem that jointly optimizes the number of inference trials and transmission power allocation. Then, we propose the Diffusion-Enhanced Deep Deterministic Policy Gradient (D3PG) algorithm to solve the problem. By incorporating the diffusion process into Deep Reinforcement Learning (DRL) architecture, the environment exploration capability can be improved, thus adapting to varying mobile AIGC scenarios. Extensive experimental results demonstrate that our prompt engineering approach improves single-round generation success probability by 6.3 times, while D3PG increases the user service experience by 67.8% compared to baseline DRL approaches.

SYApr 9
Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey

Xiaojing Chen, Haiqi Yu, Wei Ni et al.

The rapid emergence of Large Language Models (LLMs) has catalyzed Agentic artificial intelligence (AI), autonomous systems integrating perception, reasoning, and action into closed-loop pipelines for continuous adaptation. While unlocking transformative applications in mobile edge computing, autonomous systems, and next-generation wireless networks, this paradigm creates fundamental energy challenges through iterative inference and persistent data exchange. Unlike traditional AI where bottlenecks are computational Floating Point Operations (FLOPs), Agentic AI faces compounding computational and communication energy costs. In this survey, we propose an energy accounting framework identifying computational and communication costs across the Perception-Reasoning-Action cycle. We establish a unified taxonomy spanning model simplification, computation control, input and attention optimization, and hardware-aware inference. We explore cross-layer co-design strategies jointly optimizing model parameters, wireless transmissions, and edge resources. Finally, we identify open challenges of federated green learning, carbon-aware agency, 6th generation mobile communication (6G)-native Agentic AI, and self-sustaining systems, providing a roadmap for scalable autonomous intelligence.

CRJan 7
State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Ji Guo, Wenbo Jiang, Yansong Lin et al.

Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings due to environmental variability. To overcome these limitations, we introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger. To optimize trigger for insusceptibility and effectiveness, we design a Preference-guided Genetic Algorithm (PGA) that efficiently searches the state space for minimal yet potent triggers. Extensive experiments on five representative VLA models and five real-world tasks show that our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.

LGAug 3, 2025
Diffusion Models for Future Networks and Communications: A Comprehensive Survey

Nguyen Cong Luong, Nguyen Duc Hai, Duc Van Le et al.

The rise of Generative AI (GenAI) in recent years has catalyzed transformative advances in wireless communications and networks. Among the members of the GenAI family, Diffusion Models (DMs) have risen to prominence as a powerful option, capable of handling complex, high-dimensional data distribution, as well as consistent, noise-robust performance. In this survey, we aim to provide a comprehensive overview of the theoretical foundations and practical applications of DMs across future communication systems. We first provide an extensive tutorial of DMs and demonstrate how they can be applied to enhance optimizers, reinforcement learning and incentive mechanisms, which are popular approaches for problems in wireless networks. Then, we review and discuss the DM-based methods proposed for emerging issues in future networks and communications, including channel modeling and estimation, signal detection and data reconstruction, integrated sensing and communication, resource management in edge computing networks, semantic communications and other notable issues. We conclude the survey with highlighting technical limitations of DMs and their applications, as well as discussing future research directions.

LGJan 26, 2025
Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles

Hanwen Zhang, Ruichen Zhang, Wei Zhang et al.

Generative artificial intelligence, particularly through large language models (LLMs), is poised to transform energy optimization and demand side management (DSM) within microgrids. This paper explores the integration of LLMs into energy management, emphasizing their roles in automating the optimization of DSM strategies with Internet of electric vehicles. We investigate challenges and solutions associated with DSM and explore the new opportunities presented by leveraging LLMs. Then, we propose an innovative solution that enhances LLMs with retrieval-augmented generation for automatic problem formulation, code generation, and customizing optimization. We present a case study to demonstrate the effectiveness of our proposed solution in charging scheduling and optimization for electric vehicles, highlighting our solution's significant advancements in energy efficiency and user adaptability. This work underscores the potential of LLMs for energy optimization and fosters a new era of intelligent DSM solutions.

CVOct 29, 2024
FairSkin: Fair Diffusion for Skin Disease Image Generation

Ruichen Zhang, Yuguang Yao, Zhen Tan et al.

Image generation is a prevailing technique for clinical data augmentation for advancing diagnostic accuracy and reducing healthcare disparities. Diffusion Model (DM) has become a leading method in generating synthetic medical images, but it suffers from a critical twofold bias: (1) The quality of images generated for Caucasian individuals is significantly higher, as measured by the Frechet Inception Distance (FID). (2) The ability of the downstream-task learner to learn critical features from disease images varies across different skin tones. These biases pose significant risks, particularly in skin disease detection, where underrepresentation of certain skin tones can lead to misdiagnosis or neglect of specific conditions. To address these challenges, we propose FairSkin, a novel DM framework that mitigates these biases through a three-level resampling mechanism, ensuring fairer representation across racial and disease categories. Our approach significantly improves the diversity and quality of generated images, contributing to more equitable skin disease detection in clinical settings.

LGMar 3, 2025
GRNFormer: A Biologically-Guided Framework for Integrating Gene Regulatory Networks into RNA Foundation Models

Mufan Qiu, Xinyu Hu, Fengwei Zhan et al.

Foundation models for single-cell RNA sequencing (scRNA-seq) have shown promising capabilities in capturing gene expression patterns. However, current approaches face critical limitations: they ignore biological prior knowledge encoded in gene regulatory relationships and fail to leverage multi-omics signals that could provide complementary regulatory insights. In this paper, we propose GRNFormer, a new framework that systematically integrates multi-scale Gene Regulatory Networks (GRNs) inferred from multi-omics data into RNA foundation model training. Our framework introduces two key innovations. First, we introduce a pipeline for constructing hierarchical GRNs that capture regulatory relationships at both cell-type-specific and cell-specific resolutions. Second, we design a structure-aware integration framework that addresses the information asymmetry in GRNs through two technical advances: (1) A graph topological adapter using multi-head cross-attention to weight regulatory relationships dynamically, and (2) a novel edge perturbation strategy that perturb GRNs with biologically-informed co-expression links to augment graph neural network training. Comprehensive experiments have been conducted on three representative downstream tasks across multiple model architectures to demonstrate the effectiveness of GRNFormer. It achieves consistent improvements over state-of-the-art (SoTA) baselines: $3.6\%$ increase in drug response prediction correlation, $9.6\%$ improvement in single-cell drug classification AUC, and $1.1\%$ average gain in gene perturbation prediction accuracy.

LGOct 11, 2025
Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Yang Li, Ruichen Zhang, Yinqiu Liu et al.

The rapid advancement of Low-Altitude Economy Networks (LAENets) has enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection. To support these scenarios, unmanned aerial vehicles (UAVs) equipped with onboard vision-language models (VLMs) offer a promising solution for real-time multimodal inference. However, ensuring both inference accuracy and communication efficiency remains a significant challenge due to limited onboard resources and dynamic network conditions. In this paper, we first propose a UAV-enabled LAENet system model that jointly captures UAV mobility, user-UAV communication, and the onboard visual question answering (VQA) pipeline. Based on this model, we formulate a mixed-integer non-convex optimization problem to minimize task latency and power consumption under user-specific accuracy constraints. To solve the problem, we design a hierarchical optimization framework composed of two parts: (i) an Alternating Resolution and Power Optimization (ARPO) algorithm for resource allocation under accuracy constraints, and (ii) a Large Language Model-augmented Reinforcement Learning Approach (LLaRA) for adaptive UAV trajectory optimization. The large language model (LLM) serves as an expert in refining reward design of reinforcement learning in an offline fashion, introducing no additional latency in real-time decision-making. Numerical results demonstrate the efficacy of our proposed framework in improving inference performance and communication efficiency under dynamic LAENet conditions.

LGOct 10, 2025
Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach

Junchao Fan, Qi Wei, Ruichen Zhang et al.

Deep reinforcement learning (DRL) has demonstrated remarkable success in developing autonomous driving policies. However, its vulnerability to adversarial attacks remains a critical barrier to real-world deployment. Although existing robust methods have achieved success, they still suffer from three key issues: (i) these methods are trained against myopic adversarial attacks, limiting their abilities to respond to more strategic threats, (ii) they have trouble causing truly safety-critical events (e.g., collisions), but instead often result in minor consequences, and (iii) these methods can introduce learning instability and policy drift during training due to the lack of robust constraints. To address these issues, we propose Intelligent General-sum Constrained Adversarial Reinforcement Learning (IGCARL), a novel robust autonomous driving approach that consists of a strategic targeted adversary and a robust driving agent. The strategic targeted adversary is designed to leverage the temporal decision-making capabilities of DRL to execute strategically coordinated multi-step attacks. In addition, it explicitly focuses on inducing safety-critical events by adopting a general-sum objective. The robust driving agent learns by interacting with the adversary to develop a robust autonomous driving policy against adversarial attacks. To ensure stable learning in adversarial environments and to mitigate policy drift caused by attacks, the agent is optimized under a constrained formulation. Extensive experiments show that IGCARL improves the success rate by at least 27.9% over state-of-the-art methods, demonstrating superior robustness to adversarial attacks and enhancing the safety and reliability of DRL-based autonomous driving.

AIOct 7, 2025
From Agentification to Self-Evolving Agentic AI for Wireless Networks: Concepts, Approaches, and Future Research Directions

Changyuan Zhao, Ruichen Zhang, Jiacheng Wang et al.

Self-evolving agentic artificial intelligence (AI) offers a new paradigm for future wireless systems by enabling autonomous agents to continually adapt and improve without human intervention. Unlike static AI models, self-evolving agents embed an autonomous evolution cycle that updates models, tools, and workflows in response to environmental dynamics. This paper presents a comprehensive overview of self-evolving agentic AI, highlighting its layered architecture, life cycle, and key techniques, including tool intelligence, workflow optimization, self-reflection, and evolutionary learning. We further propose a multi-agent cooperative self-evolving agentic AI framework, where multiple large language models (LLMs) are assigned role-specialized prompts under the coordination of a supervisor agent. Through structured dialogue, iterative feedback, and systematic validation, the system autonomously executes the entire life cycle without human intervention. A case study on antenna evolution in low-altitude wireless networks (LAWNs) demonstrates how the framework autonomously upgrades fixed antenna optimization into movable antenna optimization. Experimental results show that the proposed self-evolving agentic AI autonomously improves beam gain and restores degraded performance by up to 52.02%, consistently surpassing the fixed baseline with little to no human intervention and validating its adaptability and robustness for next-generation wireless intelligence.

AISep 27, 2025
Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions

Mingyi Luo, Ruichen Zhang, Xiangwang Hou et al.

The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we review methods that enhance LLM reasoning capabilities, such as Chain-of-Thought (CoT) prompting, Supervised Fine-Tuning (SFT), and Mixture of Experts (MoE). Next, we present a distributed framework that addresses two correlated aspects: reasoning enhancement through adaptive CoT prompting and scalable deployment through distributed MoE architecture. The framework dynamically activates expert networks and adjusts reasoning depth based on task complexity and device capabilities. We further conduct experimental evaluations in mobile edge environments. Experimental results demonstrate the framework's effectiveness in balancing reasoning quality with resource efficiency, validating the practical viability of deploying sophisticated LLM reasoning capabilities in resource-constrained MEGI environments.

ITAug 21, 2025
Way to Build Native AI-driven 6G Air Interface: Principles, Roadmap, and Outlook

Ping Zhang, Kai Niu, Yiming Liu et al.

Artificial intelligence (AI) is expected to serve as a foundational capability across the entire lifecycle of 6G networks, spanning design, deployment, and operation. This article proposes a native AI-driven air interface architecture built around two core characteristics: compression and adaptation. On one hand, compression enables the system to understand and extract essential semantic information from the source data, focusing on task relevance rather than symbol-level accuracy. On the other hand, adaptation allows the air interface to dynamically transmit semantic information across diverse tasks, data types, and channel conditions, ensuring scalability and robustness. This article first introduces the native AI-driven air interface architecture, then discusses representative enabling methodologies, followed by a case study on semantic communication in 6G non-terrestrial networks. Finally, it presents a forward-looking discussion on the future of native AI in 6G, outlining key challenges and research opportunities.

AIJun 10, 2025
Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness

Yanwei Gong, Junchao Fan, Ruichen Zhang et al.

The rapid growth of the low-altitude economy has driven the widespread adoption of unmanned aerial vehicles (UAVs). This growing deployment presents new challenges for UAV trajectory planning in complex urban environments. However, existing studies often overlook key factors, such as urban airspace constraints and economic efficiency, which are essential in low-altitude economy contexts. Deep reinforcement learning (DRL) is regarded as a promising solution to these issues, while its practical adoption remains limited by low learning efficiency. To overcome this limitation, we propose a novel UAV trajectory planning framework that combines DRL with large language model (LLM) reasoning to enable safe, compliant, and economically viable path planning. Experimental results demonstrate that our method significantly outperforms existing baselines across multiple metrics, including data collection rate, collision avoidance, successful landing, regulatory compliance, and energy efficiency. These results validate the effectiveness of our approach in addressing UAV trajectory planning key challenges under constraints of the low-altitude economy networking.

CLDec 19, 2024
Adaptive Pruning for Large Language Models with Structural Importance Awareness

Haotian Zheng, Jinke Ren, Yushan Sun et al.

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high computational and storage resource demands. To address this issue, we propose a novel LLM model pruning method, namely structurally-aware adaptive pruning (SAAP), to significantly reduce the computational and memory costs while maintaining model performance. We first define an adaptive importance fusion metric to evaluate the importance of all coupled structures in LLMs by considering their homoscedastic uncertainty. Then, we rank the importance of all modules to determine the specific layers that should be pruned to meet particular performance requirements. Furthermore, we develop a new group fine-tuning strategy to improve the inference efficiency of LLMs. Finally, we evaluate the proposed SAAP method on multiple LLMs across two common tasks, i.e., zero-shot classification and text generation. Experimental results show that our SAAP method outperforms several state-of-the-art baseline methods, achieving 2.17%, 2.37%, and 2.39% accuracy gains on LLaMA-7B, Vicuna-7B, and LLaMA-13B. Additionally, SAAP improves the token generation speed by 5%, showcasing its practical advantages in resource-constrained scenarios.

CVDec 9, 2023
Perceptual Similarity guidance and text guidance optimization for Editing Real Images using Guided Diffusion Models

Ruichen Zhang

When using a diffusion model for image editing, there are times when the modified image can differ greatly from the source. To address this, we apply a dual-guidance approach to maintain high fidelity to the original in areas that are not altered. First, we employ text-guided optimization, using text embeddings to direct latent space and classifier-free guidance. Second, we use perceptual similarity guidance, optimizing latent vectors with posterior sampling via Tweedie formula during the reverse process. This method ensures the realistic rendering of both the edited elements and the preservation of the unedited parts of the original image.