Yin Xu

h-index17

14papers

136citations

Novelty48%

AI Score53

Ranked #29,336 of 201,326 authors (top 15%)#6,735 in LG (top 16%)

14 Papers

SYSep 21, 2017

Risk-limiting Load Restoration for Resilience Enhancement with Intermittent Energy Resources

Zhiwen Wang, Chen Shen, Yin Xu et al.

Microgrids are resources that can be used to restore critical loads after a natural disaster, enhancing resilience of a distribution network. To deal with the stochastic nature of intermittent energy resources, such as wind turbines (WTs) and photovoltaics (PVs), many methods rely on forecast information. However, some microgrids may not be equipped with power forecasting tools. To fill this gap, a risk-limiting strategy based on measurements is proposed. Gaussian mixture model (GMM) is used to represent a prior joint probability density function (PDF) of power outputs of WTs and PVs over multiple periods. As time rolls forward, the distribution of WT/PV generation is updated based the latest measurement data in a recursive manner. The updated distribution is used as an input for the risk-limiting load restoration problem, enabling an equivalent transformation of the original chance constrained problem into a mixed integer linear programming (MILP). Simulation cases on a distribution system with three microgrids demonstrate the effectiveness of the proposed method. Results also indicate that networked microgrids have better uncertainty management capabilities than stand-alone microgrids.

59.8AIMay 27

MACReD: A Multi-Agent Collaborative Reasoning Framework for Reaction Diagram Parsing

Chuang Tang, Chenhao Lin, Yin Xu et al.

Parsing chemical reaction diagrams from scientific literature is challenging due to heterogeneous layouts, intertwined visual elements, and the difficulty of integrating recognition and reasoning. Existing vision-language models advance multimodal understanding but still fail on complex diagrams, struggling to maintain spatial coherence and to integrate multidimensional information during reasoning. To address these issues, we propose MACReD, a hierarchical multi-agent framework that coordinates specialized agents for molecular perception, arrow understanding, text extraction, and reaction reconstruction within a unified VLM-guided architecture. The planning and perception layers use flexible, fine-grained detection to handle visual complexity, while the reasoning layer uses a multigraph fusion mechanism to integrate heterogeneous cues and enforce chemically consistent global reasoning. Experiments on the RxnScribe benchmark show that MACReD achieves state-of-the-art performance, with F1 scores of 75.2% and 84.6% under hard and soft match criteria, outperforming the RxnScribe baseline, which obtains 69.1% and 80.0%, respectively. These results demonstrate the robustness of MACReD across diverse diagram layouts, including multi-step and tree-structured reactions.

96.2ITApr 19

Node-Based Soft-Output Fast Successive Cancellation List Decoding of Polar Codes

Li Shen, Yongpeng Wu, Zhen Gao et al.

The soft-output successive cancellation list (SO-SCL) decoder provides a methodology for estimating the a-posteriori probability log-likelihood ratios by only leveraging the conventional SCL decoder of polar codes. However, the sequential decoding nature of SCL introduces high decoding latency to SO-SCL. In this paper, we incorporate node-based fast decoding into the SO-SCL framework. After addressing the challenge of soft output extraction in special node decoding, we proposed the soft-output fast SCL (SO-FSCL) decoding algorithm, along with its log-domain implementation and hardware-friendly version. The proposed SO-FSCL decoder can be regarded as an add-on extension to FSCL decoder, enabling us to autonomously choose whether to output only hard decisions like FSCL or to provide additional soft outputs. Latency and complexity analyses demonstrate that SO-FSCL can significantly reduce, for example, decoding time steps by 81.8\% (with unlimited resources), the number of additions by 41.3\%, and the number of comparisons by 46.4\%. Meanwhile, simulation results indicate that SO-FSCL delivers almost the same soft-output performance as SO-SCL, outperforming other soft-output polar decoders, especially in scenarios involving iterative decoding.

QUANT-PHJul 29, 2024

Quantum Long Short-Term Memory for Drug Discovery

Liang Zhang, Yin Xu, Mohan Wu et al.

Quantum computing combined with machine learning (ML) is a highly promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we present Quantum Long Short-Term Memory (QLSTM), a QML architecture, and demonstrate its effectiveness in drug discovery. We evaluate QLSTM on five benchmark datasets (BBBP, BACE, SIDER, BCAP37, T-47D), and observe consistent performance gains over classical LSTM, with ROC-AUC improvements ranging from 3% to over 6%. Furthermore, QLSTM exhibits improved predictive accuracy as the number of qubits increases, and faster convergence than classical LSTM under the same training conditions. Notably, QLSTM maintains strong robustness against quantum computer noise, outperforming noise-free classical LSTM in certain settings. These findings highlight the potential of QLSTM as a scalable and noise-resilient model for scientific applications, particularly as quantum hardware continues to advance in qubit capacity and fidelity.

CVMay 30, 2025Code

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?

Tianhong Zhou, Yin Xu, Yingtao Zhu et al.

Vision-language models (VLMs) exhibit strong zero-shot generalization on natural images and show early promise in interpretable medical image analysis. However, existing benchmarks do not systematically evaluate whether these models truly reason like human clinicians or merely imitate superficial patterns. To address this gap, we propose DrVD-Bench, the first multimodal benchmark for clinical visual reasoning. DrVD-Bench consists of three modules: Visual Evidence Comprehension, Reasoning Trajectory Assessment, and Report Generation Evaluation, comprising a total of 7,789 image-question pairs. Our benchmark covers 20 task types, 17 diagnostic categories, and five imaging modalities-CT, MRI, ultrasound, radiography, and pathology. DrVD-Bench is explicitly structured to reflect the clinical reasoning workflow from modality recognition to lesion identification and diagnosis. We benchmark 19 VLMs, including general-purpose and medical-specific, open-source and proprietary models, and observe that performance drops sharply as reasoning complexity increases. While some models begin to exhibit traces of human-like reasoning, they often still rely on shortcut correlations rather than grounded visual understanding. DrVD-Bench offers a rigorous and structured evaluation framework to guide the development of clinically trustworthy VLMs.

CLMay 6, 2025Code

TeleEval-OS: Performance evaluations of large language models for operations scheduling

Yanyan Wang, Yingying Wang, Junli Liang et al.

The rapid advancement of large language models (LLMs) has significantly propelled progress in artificial intelligence, demonstrating substantial application potential across multiple specialized domains. Telecommunications operation scheduling (OS) is a critical aspect of the telecommunications industry, involving the coordinated management of networks, services, risks, and human resources to optimize production scheduling and ensure unified service control. However, the inherent complexity and domain-specific nature of OS tasks, coupled with the absence of comprehensive evaluation benchmarks, have hindered thorough exploration of LLMs' application potential in this critical field. To address this research gap, we propose the first Telecommunications Operation Scheduling Evaluation Benchmark (TeleEval-OS). Specifically, this benchmark comprises 15 datasets across 13 subtasks, comprehensively simulating four key operational stages: intelligent ticket creation, intelligent ticket handling, intelligent ticket closure, and intelligent evaluation. To systematically assess the performance of LLMs on tasks of varying complexity, we categorize their capabilities in telecommunications operation scheduling into four hierarchical levels, arranged in ascending order of difficulty: basic NLP, knowledge Q&A, report generation, and report analysis. On TeleEval-OS, we leverage zero-shot and few-shot evaluation methods to comprehensively assess 10 open-source LLMs (e.g., DeepSeek-V3) and 4 closed-source LLMs (e.g., GPT-4o) across diverse scenarios. Experimental results demonstrate that open-source LLMs can outperform closed-source LLMs in specific scenarios, highlighting their significant potential and value in the field of telecommunications operation scheduling.

SYJan 6

Post-Earthquake Restoration of Electricity-Gas Distribution Systems with Damage Information Collection and Repair Vehicle Routing

Mingxuan Li, Wei Wei, Yin Xu et al.

Extreme events such as earthquakes pose significant threats to integrated electricity-gas distribution systems (IEGDS) by causing widespread damage. Existing restoration approaches typically assume full awareness of damage, which may not be true if monitoring and communication infrastructures are impaired. In such circumstances, field inspection is necessary. This paper presents a novel adaptive restoration framework for IEGDS, considering dynamic damage assessment and repair. The restoration problem is formulated as a partially observable Markov decision process (POMDP), capturing the gradually revealed contingency and the evolving impact of field crew actions. To address the computational challenges of POMDPs in real-time applications, an advanced belief tree search (BTS) algorithm is introduced. This algorithm enables crew members to continuously update their actions based on evolving belief states, leveraging comprehensive simulations to evaluate potential future trajectories and identify optimal inspection and repair strategies. Based on the BTS algorithm, a unified real-time decision-making framework is developed for IEGDS restoration. Case studies on two distinct IEGDS systems demonstrate the effectiveness and scalability of the proposed method. The results indicate that the proposed approach achieves an outage cost comparable to the ideal solution, and reduces the total outage cost by more than 15% compared to strategies based on stochastic programming and heuristic methods.

SPJun 9, 2025

Channel Estimation for RIS-Assisted mmWave Systems via Diffusion Models

Yang Wang, Yin Xu, Cixiao Zhang et al.

Reconfigurable intelligent surface (RIS) has been recognized as a promising technology for next-generation wireless communications. However, the performance of RIS-assisted systems critically depends on accurate channel state information (CSI). To address this challenge, this letter proposes a novel channel estimation method for RIS-aided millimeter-wave (mmWave) systems based on diffusion models (DMs). Specifically, the forward diffusion process of the original signal is formulated to model the received signal as a noisy observation within the framework of DMs. Subsequently, the channel estimation task is formulated as the reverse diffusion process, and a sampling algorithm based on denoising diffusion implicit models (DDIMs) is developed to enable effective inference. Furthermore, a lightweight neural network, termed BRCNet, is introduced to replace the conventional U-Net, significantly reducing the number of parameters and computational complexity. Extensive experiments conducted under various scenarios demonstrate that the proposed method consistently outperforms existing baselines.

LGMay 15, 2025

AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

Tianyu Jiao, Zhuoran Xiao, Yihang Huang et al.

Designing a 6G-oriented universal model capable of processing multi-modal data and executing diverse air interface tasks has emerged as a common goal in future wireless systems. Building on our prior work in communication multi-modal alignment and telecom large language model (LLM), we propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM), which flexibility and effectively perform various physical layer tasks according to subtle task instructions. The LLM backbone provides robust contextual comprehension and generalization capabilities, while a fine-tuning approach is adopted to incorporate domain-specific knowledge. To enhance task adaptability, task instructions consist of fixed task keywords and learnable, implicit prefix prompts. Frozen radio modality encoders extract universal representations and adapter layers subsequently bridge radio and language modalities. Moreover, lightweight task-specific heads are designed to directly output task objectives. Comprehensive evaluations demonstrate that AI2MMUM achieves SOTA performance across five representative physical environment/wireless channel-based downstream tasks using the WAIR-D and DeepMIMO datasets.

LGJul 11, 2025

SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation

Haotian Xu, Jinrui Zhou, Xichong Zhang et al.

Federated Learning (FL) is a distributed machine learning paradigm which coordinates multiple clients to collaboratively train a global model via a central server. Sequential Federated Learning (SFL) is a newly-emerging FL training framework where the global model is trained in a sequential manner across clients. Since SFL can provide strong convergence guarantees under data heterogeneity, it has attracted significant research attention in recent years. However, experiments show that SFL suffers from severe catastrophic forgetting in heterogeneous environments, meaning that the model tends to forget knowledge learned from previous clients. To address this issue, we propose an SFL framework with discrepancy-aware multi-teacher knowledge distillation, called SFedKD, which selects multiple models from the previous round to guide the current round of training. In SFedKD, we extend the single-teacher Decoupled Knowledge Distillation approach to our multi-teacher setting and assign distinct weights to teachers' target-class and non-target-class knowledge based on the class distributional discrepancy between teacher and student data. Through this fine-grained weighting strategy, SFedKD can enhance model training efficacy while mitigating catastrophic forgetting. Additionally, to prevent knowledge dilution, we eliminate redundant teachers for the knowledge distillation and formalize it as a variant of the maximum coverage problem. Based on the greedy strategy, we design a complementary-based teacher selection mechanism to ensure that the selected teachers achieve comprehensive knowledge space coverage while reducing communication and computational costs. Extensive experiments show that SFedKD effectively overcomes catastrophic forgetting in SFL and outperforms state-of-the-art FL methods.

CVJun 15, 2025

Rasterizing Wireless Radiance Field via Deformable 2D Gaussian Splatting

Mufan Liu, Cixiao Zhang, Qi Yang et al.

Modeling the wireless radiance field (WRF) is fundamental to modern communication systems, enabling key tasks such as localization, sensing, and channel estimation. Traditional approaches, which rely on empirical formulas or physical simulations, often suffer from limited accuracy or require strong scene priors. Recent neural radiance field (NeRF-based) methods improve reconstruction fidelity through differentiable volumetric rendering, but their reliance on computationally expensive multilayer perceptron (MLP) queries hinders real-time deployment. To overcome these challenges, we introduce Gaussian splatting (GS) to the wireless domain, leveraging its efficiency in modeling optical radiance fields to enable compact and accurate WRF reconstruction. Specifically, we propose SwiftWRF, a deformable 2D Gaussian splatting framework that synthesizes WRF spectra at arbitrary positions under single-sided transceiver mobility. SwiftWRF employs CUDA-accelerated rasterization to render spectra at over 100000 fps and uses a lightweight MLP to model the deformation of 2D Gaussians, effectively capturing mobility-induced WRF variations. In addition to novel spectrum synthesis, the efficacy of SwiftWRF is further underscored in its applications in angle-of-arrival (AoA) and received signal strength indicator (RSSI) prediction. Experiments conducted on both real-world and synthetic indoor scenes demonstrate that SwiftWRF can reconstruct WRF spectra up to 500x faster than existing state-of-the-art methods, while significantly enhancing its signal quality. The project page is https://evan-sudo.github.io/swiftwrf/.

LGJan 4, 2025

Diffusion Model-Based Data Synthesis Aided Federated Semi-Supervised Learning

Zhongwei Wang, Tong Wu, Zhiyong Chen et al.

Federated semi-supervised learning (FSSL) is primarily challenged by two factors: the scarcity of labeled data across clients and the non-independent and identically distribution (non-IID) nature of data among clients. In this paper, we propose a novel approach, diffusion model-based data synthesis aided FSSL (DDSA-FSSL), which utilizes a diffusion model (DM) to generate synthetic data, bridging the gap between heterogeneous local data distributions and the global data distribution. In DDSA-FSSL, clients address the challenge of the scarcity of labeled data by employing a federated learning-trained classifier to perform pseudo labeling for unlabeled data. The DM is then collaboratively trained using both labeled and precision-optimized pseudo-labeled data, enabling clients to generate synthetic samples for classes that are absent in their labeled datasets. This process allows clients to generate more comprehensive synthetic datasets aligned with the global distribution. Extensive experiments conducted on multiple datasets and varying non-IID distributions demonstrate the effectiveness of DDSA-FSSL, e.g., it improves accuracy from 38.46% to 52.14% on CIFAR-10 datasets with 10% labeled data.

ITOct 16, 2024

Two Birds with One Stone: Multi-Task Semantic Communications Systems over Relay Channel

Yujie Cao, Tong Wu, Zhiyong Chen et al.

In this paper, we propose a novel multi-task, multi-link relay semantic communications (MTML-RSC) scheme that enables the destination node to simultaneously perform image reconstruction and classification with one transmission from the source node. In the MTML-RSC scheme, the source node broadcasts a signal using semantic communications, and the relay node forwards the signal to the destination. We analyze the coupling relationship between the two tasks and the two links (source-to-relay and source-to-destination) and design a semantic-focused forward method for the relay node, where it selectively forwards only the semantics of the relevant class while ignoring others. At the destination, the node combines signals from both the source node and the relay node to perform classification, and then uses the classification result to assist in decoding the signal from the relay node for image reconstructing. Experimental results demonstrate that the proposed MTML-RSC scheme achieves significant performance gains, e.g., $1.73$ dB improvement in peak-signal-to-noise ratio (PSNR) for image reconstruction and increasing the accuracy from $64.89\%$ to $70.31\%$ for classification.

CRSep 17, 2019

Privacy-preserving Double Auction Mechanism Based on Homomorphic Encryption and Sorting Networks

Yin Xu, Zhili Chen, Hong Zhong

As an effective resource allocation approach, double auctions (DAs) have been extensively studied in electronic commerce. Most previous studies have focused on how to design strategy-proof DA mechanisms, while not much research effort has been done concerning privacy and security issues. However, security, especially privacy issues have become such a public concern that the European governments lay down the law to enforce the privacy guarantees recently. In this paper, to address the privacy issue in electronic auctions, we concentrate on how to design a privacy-preserving mechanism for double auctions by employing Goldwasser-Micali homomorphic encryption and sorting networks. We achieve provable privacy such that the auctions do not reveal any bid information except the auction results, resulting in a strict privacy guarantee. Moreover, to achieve practical system performance, we compare different sorting algorithms, and suggest using the faster ones. Experimental results show that different sorting algorithms may have great effect on the performance of our mechanism, and demonstrate the practicality of our protocol for real-world applications in electronic commerce.