Xin Zhang

h-index26

5papers

217citations

Novelty55%

AI Score40

Ranked #73,796 of 194,257 authors (top 38%)#756 in IV (top 17%)

5 Papers

5.5CLJan 8, 2024Code

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Dong Zhang, Zhaowei Li, Pengyu Wang et al.

Human communication is a complex and diverse process that not only involves multiple factors such as language, commonsense, and cultural backgrounds but also requires the participation of multimodal information, such as speech. Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systems mainly rely on text as the primary medium. In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication. SpeechAgents utilizes multi-modal LLM as the control center for individual agent and employes multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without compromising general abilities. To strengthen and evaluate the effectiveness of human communication simulation, we build the Human-Communication Simulation Benchmark. Experimental results demonstrate that SpeechAgents can simulate human communication dialogues with consistent content, authentic rhythm, and rich emotions and demonstrate excellent scalability even with up to 25 agents, which can apply to tasks such as drama creation and audio novels generation. Code and models will be open-sourced at https://github. com/0nutation/SpeechAgents

5.8AIFeb 24, 2025

TabulaTime: A Novel Multimodal Deep Learning Framework for Advancing Acute Coronary Syndrome Prediction through Environmental and Clinical Data Integration

Xin Zhang, Liangxiu Han, Stephen White et al.

Acute Coronary Syndromes (ACS), including ST-segment elevation myocardial infarctions (STEMI) and non-ST-segment elevation myocardial infarctions (NSTEMI), remain a leading cause of mortality worldwide. Traditional cardiovascular risk scores rely primarily on clinical data, often overlooking environmental influences like air pollution that significantly impact heart health. Moreover, integrating complex time-series environmental data with clinical records is challenging. We introduce TabulaTime, a multimodal deep learning framework that enhances ACS risk prediction by combining clinical risk factors with air pollution data. TabulaTime features three key innovations: First, it integrates time-series air pollution data with clinical tabular data to improve prediction accuracy. Second, its PatchRWKV module automatically extracts complex temporal patterns, overcoming limitations of traditional feature engineering while maintaining linear computational complexity. Third, attention mechanisms enhance interpretability by revealing interactions between clinical and environmental factors. Experimental results show that TabulaTime improves prediction accuracy by over 20% compared to conventional models such as CatBoost, Random Forest, and LightGBM, with air pollution data alone contributing over a 10% improvement. Feature importance analysis identifies critical predictors including previous angina, systolic blood pressure, PM10, and NO2. Overall, TabulaTime bridges clinical and environmental insights, supporting personalized prevention strategies and informing public health policies to mitigate ACS risk.

4.1LGSep 15, 2025

OASIS: A Deep Learning Framework for Universal Spectroscopic Analysis Driven by Novel Loss Functions

Chris Young, Juejing Liu, Marie L. Mortensen et al.

The proliferation of spectroscopic data across various scientific and engineering fields necessitates automated processing. We introduce OASIS (Omni-purpose Analysis of Spectra via Intelligent Systems), a machine learning (ML) framework for technique-independent, automated spectral analysis, encompassing denoising, baseline correction, and comprehensive peak parameter (location, intensity, FWHM) retrieval without human intervention. OASIS achieves its versatility through models trained on a strategically designed synthetic dataset incorporating features from numerous spectroscopy techniques. Critically, the development of innovative, task-specific loss functions-such as the vicinity peak response (ViPeR) for peak localization-enabled the creation of compact yet highly accurate models from this dataset, validated with experimental data from Raman, UV-vis, and fluorescence spectroscopy. OASIS demonstrates significant potential for applications including in situ experiments, high-throughput optimization, and online monitoring. This study underscores the optimization of the loss function as a key resource-efficient strategy to develop high-performance ML models.

4.4IVOct 20, 2021

CXR-Net: An Encoder-Decoder-Encoder Multitask Deep Neural Network for Explainable and Accurate Diagnosis of COVID-19 pneumonia with Chest X-ray Images

Xin Zhang, Liangxiu Han, Tam Sobeih et al.

Accurate and rapid detection of COVID-19 pneumonia is crucial for optimal patient treatment. Chest X-Ray (CXR) is the first line imaging test for COVID-19 pneumonia diagnosis as it is fast, cheap and easily accessible. Inspired by the success of deep learning (DL) in computer vision, many DL-models have been proposed to detect COVID-19 pneumonia using CXR images. Unfortunately, these deep classifiers lack the transparency in interpreting findings, which may limit their applications in clinical practice. The existing commonly used visual explanation methods are either too noisy or imprecise, with low resolution, and hence are unsuitable for diagnostic purposes. In this work, we propose a novel explainable deep learning framework (CXRNet) for accurate COVID-19 pneumonia detection with an enhanced pixel-level visual explanation from CXR images. The proposed framework is based on a new Encoder-Decoder-Encoder multitask architecture, allowing for both disease classification and visual explanation. The method has been evaluated on real world CXR datasets from both public and private data sources, including: healthy, bacterial pneumonia, viral pneumonia and COVID-19 pneumonia cases The experimental results demonstrate that the proposed method can achieve a satisfactory level of accuracy and provide fine-resolution classification activation maps for visual explanation in lung disease detection. The Average Accuracy, the Precision, Recall and F1-score of COVID-19 pneumonia reached 0.879, 0.985, 0.992 and 0.989, respectively. We have also found that using lung segmented (CXR) images can help improve the performance of the model. The proposed method can provide more detailed high resolution visual explanation for the classification decision, compared to current state-of-the-art visual explanation methods and has a great potential to be used in clinical practice for COVID-19 pneumonia diagnosis.

18.7IVAug 10, 2020

An Explainable 3D Residual Self-Attention Deep Neural Network FOR Joint Atrophy Localization and Alzheimer's Disease Diagnosis using Structural MRI

Xin Zhang, Liangxiu Han, Wenyong Zhu et al.

Computer-aided early diagnosis of Alzheimer's disease (AD) and its prodromal form mild cognitive impairment (MCI) based on structure Magnetic Resonance Imaging (sMRI) has provided a cost-effective and objective way for early prevention and treatment of disease progression, leading to improved patient care. In this work, we have proposed a novel computer-aided approach for early diagnosis of AD by introducing an explainable 3D Residual Attention Deep Neural Network (3D ResAttNet) for end-to-end learning from sMRI scans. Different from the existing approaches, the novelty of our approach is three-fold: 1) A Residual Self-Attention Deep Neural Network has been proposed to capture local, global and spatial information of MR images to improve diagnostic performance; 2) An explanation method using Gradient-based Localization Class Activation mapping (Grad-CAM) has been introduced to improve the explainable of the proposed method; 3) This work has provided a full end-to-end learning solution for automated disease diagnosis. Our proposed 3D ResAttNet method has been evaluated on a large cohort of subjects from real datasets for two changeling classification tasks (i.e., Alzheimer's disease (AD) vs. Normal cohort (NC) and progressive MCI (pMCI) vs. stable MCI (sMCI)). The experimental results show that the proposed approach has a competitive advantage over the state-of-the-art models in terms of accuracy performance and generalizability. The explainable mechanism in our approach is able to identify and highlight the contribution of the important brain parts (e.g., hippocampus, lateral ventricle and most parts of the cortex) for transparent decisions.