Xiaoyu Song

IV
h-index11
11papers
150citations
Novelty46%
AI Score47

11 Papers

MMJan 5Code
Encoder-Free ECG-Language Models

William Han, Tony Chen, Chaojing Duan et al. · cmu

ECG-Language Models (ELMs) extend recent progress in Multimodal Large Language Models (MLLMs) to automated ECG interpretation. However, most ELMs follow Vision-Language Model (VLM) designs and depend on pretrained ECG encoders, adding architectural and training complexity. Inspired by encoder-free VLMs, we introduce ELF, an encoder-free ELM that replaces the ECG encoder with a single projection layer trained jointly with the LLM. Across five datasets, ELF matches or exceeds state-of-the-art ELMs that use far more complex encoders and training pipelines. We also test whether adding architectural biases to ELF improves performance and find that the single linear projection remains competitive. Finally, we show that ELF, and potentially other ELMs, often rely more on benchmark artifacts and language priors than ECG-derived information, highlighting limitations in current evaluation practices and ELM design. All data and code is available at https://github.com/willxxy/ECG-Bench.

IVApr 25, 2022
Multi-scale reconstruction of undersampled spectral-spatial OCT data for coronary imaging using deep learning

Xueshen Li, Shengting Cao, Hongshan Liu et al.

Coronary artery disease (CAD) is a cardiovascular condition with high morbidity and mortality. Intravascular optical coherence tomography (IVOCT) has been considered as an optimal imagining system for the diagnosis and treatment of CAD. Constrained by Nyquist theorem, dense sampling in IVOCT attains high resolving power to delineate cellular structures/ features. There is a trade-off between high spatial resolution and fast scanning rate for coronary imaging. In this paper, we propose a viable spectral-spatial acquisition method that down-scales the sampling process in both spectral and spatial domain while maintaining high quality in image reconstruction. The down-scaling schedule boosts data acquisition speed without any hardware modifications. Additionally, we propose a unified multi-scale reconstruction framework, namely Multiscale- Spectral-Spatial-Magnification Network (MSSMN), to resolve highly down-scaled (compressed) OCT images with flexible magnification factors. We incorporate the proposed methods into Spectral Domain OCT (SD-OCT) imaging of human coronary samples with clinical features such as stent and calcified lesions. Our experimental results demonstrate that spectral-spatial downscaled data can be better reconstructed than data that is downscaled solely in either spectral or spatial domain. Moreover, we observe better reconstruction performance using MSSMN than using existing reconstruction methods. Our acquisition method and multi-scale reconstruction framework, in combination, may allow faster SD-OCT inspection with high resolution during coronary intervention.

IVNov 12, 2022
Structural constrained virtual histology staining for human coronary imaging using deep learning

Xueshen Li, Hongshan Liu, Xiaoyu Song et al.

Histopathological analysis is crucial in artery characterization for coronary artery disease (CAD). However, histology requires an invasive and time-consuming process. In this paper, we propose to generate virtual histology staining using Optical Coherence Tomography (OCT) images to enable real-time histological visualization. We develop a deep learning network, namely Coronary-GAN, to transfer coronary OCT images to virtual histology images. With a special consideration on the structural constraints in coronary OCT images, our method achieves better image generation performance than the conventional GAN-based method. The experimental results indicate that Coronary-GAN generates virtual histology images that are similar to real histology images, revealing the human coronary layers.

IVJul 22, 2023
SCPAT-GAN: Structural Constrained and Pathology Aware Convolutional Transformer-GAN for Virtual Histology Staining of Human Coronary OCT images

Xueshen Li, Hongshan Liu, Xiaoyu Song et al.

There is a significant need for the generation of virtual histological information from coronary optical coherence tomography (OCT) images to better guide the treatment of coronary artery disease. However, existing methods either require a large pixel-wisely paired training dataset or have limited capability to map pathological regions. To address these issues, we proposed a structural constrained, pathology aware, transformer generative adversarial network, namely SCPAT-GAN, to generate virtual stained H&E histology from OCT images. The proposed SCPAT-GAN advances existing methods via a novel design to impose pathological guidance on structural layers using transformer-based network.

ROMar 30
A Foldable and Agile Soft Electromagnetic Robot for Multimodal Navigation in Confined and Unstructured Environments

Zhihao Lv, Xiaoyong Zhang, Mengfan Zhang et al.

Multimodal locomotion is crucial for an animal's adaptability in unstructured wild environments. Similarly, in the human gastrointestinal tract, characterized by viscoelastic mucus, complex rugae, and narrow sphincters like the cardia, multimodal locomotion is also essential for a small-scale soft robot to conduct tasks. Here, we introduce a small-scale compact, foldable, and robust soft electromagnetic robot (M-SEMR) with more than nine locomotion modes designed for such a scenario. Featuring a six-spoke elastomer body embedded with liquid metal channels and driven by Laplace forces under a static magnetic field, the M-SEMR is capable of rapid transitions (< 0.35 s) among different locomotion modes. It achieves exceptional agility, including high-speed rolling (818 mm/s, 26 BL/s), omnidirectional crawling, jumping, and swimming. Notably, the robot can fold to reduce its volume by 79%, enabling it to traverse confined spaces. We further validate its navigation capabilities on complex terrains, including discrete obstacles, viscoelastic gelatin surfaces, viscous fluids, and simulated biological tissues. This system offers a versatile strategy for developing high-mobility soft robots for future biomedical applications.

CVMay 7, 2022
Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep Convolutional Neural Networks

Zhaofeng Si, Honggang Qi, Xiaoyu Song

Convolutional neural networks are prevailing in deep learning tasks. However, they suffer from massive cost issues when working on mobile devices. Network pruning is an effective method of model compression to handle such problems. This paper presents a novel structured network pruning method with auxiliary gating structures which assigns importance marks to blocks in backbone network as a criterion when pruning. Block-wise pruning is then realized by proposed voting strategy, which is different from prevailing methods who prune a model in small granularity like channel-wise. We further develop a three-stage training scheduling for the proposed architecture incorporating knowledge distillation for better performance. Our experiments demonstrate that our method can achieve state-of-the-arts compression performance for the classification tasks. In addition, our approach can integrate synergistically with other pruning methods by providing pretrained models, thus achieving a better performance than the unpruned model with over 93\% FLOPs reduced.

AIMar 4, 2022
Modeling and Validating Temporal Rules with Semantic Petri-Net for Digital Twins

Han Liu, Xiaoyu Song, Ge Gao et al.

Semantic rule checking on RDFS/OWL data has been widely used in the construction industry. At present, semantic rule checking is mainly performed on static models. There are still challenges in integrating temporal models and semantic models for combined rule checking. In this paper, Semantic Petri-Net (SPN) is proposed as a novel temporal modeling and validating method, which implements the states and transitions of the Colored Petri-Net directly based on RDFS and SPARQL, and realizes two-way sharing of knowledge between domain semantic webs and temporal models in the runtime. Several cases are provided to demonstrate the possible applications in digital twins with concurrent state changes and dependencies.

CLSep 30, 2025Code
Retrieval-Augmented Generation for Electrocardiogram-Language Models

Xiaoyu Song, William Han, Tony Chen et al. · cmu

Interest in generative Electrocardiogram-Language Models (ELMs) is growing, as they can produce textual responses conditioned on ECG signals and textual queries. Unlike traditional classifiers that output label probabilities, ELMs are more versatile, supporting domain-specific tasks (e.g., waveform analysis, diagnosis, prognosis) as well as general tasks (e.g., open-ended questions, dialogue). Retrieval-Augmented Generation (RAG), widely used in Large Language Models (LLMs) to ground LLM outputs in retrieved knowledge, helps reduce hallucinations and improve natural language generation (NLG). However, despite its promise, no open-source implementation or systematic study of RAG pipeline design for ELMs currently exists. To address this gap, we present the first open-source RAG pipeline for ELMs, along with baselines and ablation studies for NLG. Experiments on three public datasets show that ELMs with RAG consistently improves performance over non-RAG baselines and highlights key ELM design considerations. Our code is available at: https://github.com/willxxy/ECG-Bench.

AIMay 24, 2025
Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

William Han, Chaojing Duan, Zhepeng Cen et al. · cmu

Recent advances have increasingly applied large language models (LLMs) to electrocardiogram (ECG) interpretation, giving rise to Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual query, an ELM autoregressively generates a free-form textual response. Unlike traditional classification-based systems, ELMs emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans. To realize this potential, researchers are curating instruction-tuning datasets that pair ECGs with textual dialogues and are training ELMs on these resources. Yet before scaling ELMs further, there is a fundamental question yet to be explored: What is the most effective ECG input representation? In recent works, three candidate representations have emerged-raw time-series signals, rendered images, and discretized symbolic sequences. We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics. We find symbolic representations achieve the greatest number of statistically significant wins over both signal and image inputs. We further ablate the LLM backbone, ECG duration, and token budget, and we evaluate robustness to signal perturbations. We hope that our findings offer clear guidance for selecting input representations when developing the next generation of ELMs.

QUANT-PHAug 2, 2019
Machine-learning based three-qubit gate for realization of a Toffoli gate with cQED-based transmon systems

Sahar Daraeizadeh, Shavindra P. Premaratne, Xiaoyu Song et al.

We use machine learning techniques to design a 50 ns three-qubit flux-tunable controlled-controlled-phase gate with fidelity of >99.99% for nearest-neighbor coupled transmons in circuit quantum electrodynamics architectures. We explain our gate design procedure where we enforce realistic constraints, and analyze the new gate's robustness under decoherence, distortion, and random noise. Our controlled-controlled-phase gate in combination with two single-qubit gates realizes a Toffoli gate which is widely used in quantum circuits, logic synthesis, quantum error correction, and quantum games.

SEMar 27, 2013
Exponential-Condition-Based Barrier Certificate Generation for Safety Verification of Hybrid Systems

Hui Kong, Fei He, Xiaoyu Song et al.

A barrier certificate is an inductive invariant function which can be used for the safety verification of a hybrid system. Safety verification based on barrier certificate has the benefit of avoiding explicit computation of the exact reachable set which is usually intractable for nonlinear hybrid systems. In this paper, we propose a new barrier certificate condition, called Exponential Condition, for the safety verification of semi-algebraic hybrid systems. The most important benefit of Exponential Condition is that it has a lower conservativeness than the existing convex condition and meanwhile it possesses the property of convexity. On the one hand, a less conservative barrier certificate forms a tighter over-approximation for the reachable set and hence is able to verify critical safety properties. On the other hand, the property of convexity guarantees its solvability by semidefinite programming method. Some examples are presented to illustrate the effectiveness and practicality of our method.