Xiaocheng Zhang

CL
h-index16
8papers
51citations
Novelty49%
AI Score48

8 Papers

74.8SYApr 24
Optical Network Digital Twin -- Practical Use Cases and Architecture

Hideki Nishizawa, Toru Mano, Kazuya Anazawa et al.

With the widespread adoption of AI, machine-to-machine communications are rapidly increasing, reshaping the requirements for optical networks. Recent advances in Gaussian noise modeling for digital coherent transmission have raised expectations for digital-twin-based operation. However, unlike digital twins in wireless communication, which are already well established, significant barriers remain for commercialization in optical networks. This paper discusses the evolving requirements of optical networks in the AI era and proposes a practical Optical Network Digital Twin architecture enabling dynamic and Quality of Transmission aware operation beyond conventional management. Representative use cases, including operator-driven optimization, user-operator collaboration, and multi-operator interconnection, are presented, along with the architectural framework and key challenges toward practical deployment.

AIJan 22Code
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Taofeng Xue, Chong Peng, Mianqiu Huang et al.

The development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work, we introduce EvoCUA, a native computer use agentic model. Unlike static imitation, EvoCUA integrates data generation and policy optimization into a self-sustaining evolutionary cycle. To mitigate data scarcity, we develop a verifiable synthesis engine that autonomously generates diverse tasks coupled with executable validators. To enable large-scale experience acquisition, we design a scalable infrastructure orchestrating tens of thousands of asynchronous sandbox rollouts. Building on these massive trajectories, we propose an iterative evolving learning strategy to efficiently internalize this experience. This mechanism dynamically regulates policy updates by identifying capability boundaries -- reinforcing successful routines while transforming failure trajectories into rich supervision through error analysis and self-correction. Empirical evaluations on the OSWorld benchmark demonstrate that EvoCUA achieves a success rate of 56.7%, establishing a new open-source state-of-the-art. Notably, EvoCUA significantly outperforms the previous best open-source model, OpenCUA-72B (45.0%), and surpasses leading closed-weights models such as UI-TARS-2 (53.1%). Crucially, our results underscore the generalizability of this approach: the evolving paradigm driven by learning from experience yields consistent performance gains across foundation models of varying scales, establishing a robust and scalable path for advancing native agent capabilities.

CVJul 24, 2025Code
IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning

Tianheng Qiu, Jingchun Gao, Jingyu Li et al.

Intent-oriented controlled video captioning aims to generate targeted descriptions for specific targets in a video based on customized user intent. Current Large Visual Language Models (LVLMs) have gained strong instruction following and visual comprehension capabilities. Although the LVLMs demonstrated proficiency in spatial and temporal understanding respectively, it was not able to perform fine-grained spatial control in time sequences in direct response to instructions. This substantial spatio-temporal gap complicates efforts to achieve fine-grained intention-oriented control in video. Towards this end, we propose a novel IntentVCNet that unifies the temporal and spatial understanding knowledge inherent in LVLMs to bridge the spatio-temporal gap from both prompting and model perspectives. Specifically, we first propose a prompt combination strategy designed to enable LLM to model the implicit relationship between prompts that characterize user intent and video sequences. We then propose a parameter efficient box adapter that augments the object semantic information in the global visual context so that the visual token has a priori information about the user intent. The final experiment proves that the combination of the two strategies can further enhance the LVLM's ability to model spatial details in video sequences, and facilitate the LVLMs to accurately generate controlled intent-oriented captions. Our proposed method achieved state-of-the-art results in several open source LVLMs and was the runner-up in the IntentVC challenge. Our code is available on https://github.com/thqiu0419/IntentVCNet.

CLSep 17, 2024
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization

Jianing Wang, Yang Zhou, Xiaocheng Zhang et al.

Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs), but the performance is still underwhelming due to too much noisy preference data yielded in the loop. To combat this issue, we present an \textbf{U}ncertainty-enhanced \textbf{P}reference \textbf{O}ptimization (UPO) framework to make the LLM self-evolve with reliable feedback. The key idea is mitigating the noisy preference data derived from the current policy and reward models by performing pair-wise uncertainty estimation and judiciously reliable feedback sampling. To reach this goal, we thus introduce an estimator model, which incorporates Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the preference data derived from the LLM policy. Compared to the existing methods that directly filter generated responses based on the reward score, the estimator focuses on the model uncertainty in a pair-wise manner and effectively bypasses the confirmation bias problem of the reward model. Additionally, we also propose an uncertainty-enhanced self-evolution algorithm to improve the robustness of preference optimization and encourage the LLM to generate responses with both high reward and certainty. Extensive experiments over multiple benchmarks demonstrate that our framework substantially alleviates the noisy problem and improves the performance of iterative preference optimization.

CLDec 29, 2023
EHR Interaction Between Patients and AI: NoteAid EHR Interaction

Xiaocheng Zhang, Zonghai Yao, Hong Yu

With the rapid advancement of Large Language Models (LLMs) and their outstanding performance in semantic and contextual comprehension, the potential of LLMs in specialized domains warrants exploration. This paper introduces the NoteAid EHR Interaction Pipeline, an innovative approach developed using generative LLMs to assist in patient education, a task stemming from the need to aid patients in understanding Electronic Health Records (EHRs). Building upon the NoteAid work, we designed two novel tasks from the patient's perspective: providing explanations for EHR content that patients may not understand and answering questions posed by patients after reading their EHRs. We extracted datasets containing 10,000 instances from MIMIC Discharge Summaries and 876 instances from the MADE medical notes collection, respectively, executing the two tasks through the NoteAid EHR Interaction Pipeline with these data. Performance data of LLMs on these tasks were collected and constructed as the corresponding NoteAid EHR Interaction Dataset. Through a comprehensive evaluation of the entire dataset using LLM assessment and a rigorous manual evaluation of 64 instances, we showcase the potential of LLMs in patient education. Besides, the results provide valuable data support for future exploration and applications in this domain while also supplying high-quality synthetic datasets for in-house system training.

CLOct 19, 2024
TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation

Xiaocheng Zhang, Xi Wang, Yifei Lu et al.

Fact-checking benchmarks provide standardized testing criteria for automated fact-checking systems, driving technological advancement. With the surge of misinformation on social media and the emergence of various fact-checking methods, public concern about the transparency of automated systems and the accuracy of fact-checking for high infulence events has grown. However, existing benchmarks fail to meet these urgent needs and are predominantly English-centric, hindering the progress of comprehensive fact-checking. To address these issues, we introduce TrendFact, the first benchmark capable of evaluating hotspot perception ability (HPA) and all fact-checking tasks. TrendFact consists of 7,643 curated samples sourced from trending platforms and professional fact-checking datasets, as well as an evidence library containing 366,634 entries with publication dates. Additionally, to complement existing benchmarks in evaluating system explanation consistency and HPA, we propose two new metrics: ECS and HCPI. Experimental results show that current fact-checking systems face significant limitations when evaluated on TrendFact, which facilitates the development of more robust fact-checking methods. Furthermore, to enhance the capabilities of existing advanced fact-checking systems, the reasoning large language models (RLMs), we propose FactISR, a reasoning framework that integrates dynamic evidence augmentation with influence score-based iterative self-reflection. FactISR effectively improves RLM's performance, offering new insights into explainable and complex fact-checking.

LGJan 21, 2025
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection

Xiaocheng Zhang, Zhuangzhuang Ye, GuoPing Zhao et al.

In fraud detection, fraudsters often interact with many benign users, camouflaging their features or relations to hide themselves. Most existing work concentrates solely on either feature camouflage or relation camouflage, or decoupling feature learning and relation learning to avoid the two camouflage from affecting each other. However, this inadvertently neglects the valuable information derived from features or relations, which could mutually enhance their adversarial camouflage strategies. In response to this gap, we propose SCFCRC, a Transformer-based fraud detector that Simultaneously Counteract Feature Camouflage and Relation Camouflage. SCFCRC consists of two components: Feature Camouflage Filter and Relation Camouflage Refiner. The feature camouflage filter utilizes pseudo labels generated through label propagation to train the filter and uses contrastive learning that combines instance-wise and prototype-wise to improve the quality of features. The relation camouflage refiner uses Mixture-of-Experts(MoE) network to disassemble the multi-relations graph into multiple substructures and divide and conquer them to mitigate the degradation of detection performance caused by relation camouflage. Furthermore, we introduce a regularization method for MoE to enhance the robustness of the model. Extensive experiments on two fraud detection benchmark datasets demonstrate that our method outperforms state-of-the-art baselines.

SYSep 11, 2018
Distributed Kalman Filter for A Class of Nonlinear Uncertain Systems: An Extended State Method

Xingkang He, Xiaocheng Zhang, Wenchao Xue et al.

This paper studies the distributed state estimation problem for a class of discrete-time stochastic systems with nonlinear uncertain dynamics over time-varying topologies of sensor networks. An extended state vector consisting of the original state and the nonlinear dynamics is constructed. By analyzing the extended system, we provide a design method for the filtering gain and fusion matrices, leading to the extended state distributed Kalman filter. It is shown that the proposed filter can provide the upper bound of estimation covariance in real time, which means the estimation accuracy can be evaluated online.It is proven that the estimation covariance of the filter is bounded under rather mild assumptions, i.e., collective observability of the system and jointly strong connectedness of network topologies. Numerical simulation shows the effectiveness of the proposed filter.