SPJul 4, 2022
Attention mechanisms for physiological signal deep learning: which attention should we take?Seong-A Park, Hyung-Chul Lee, Chul-Woo Jung et al.
Attention mechanisms are widely used to dramatically improve deep learning model performance in various fields. However, their general ability to improve the performance of physiological signal deep learning model is immature. In this study, we experimentally analyze four attention mechanisms (e.g., squeeze-and-excitation, non-local, convolutional block attention module, and multi-head self-attention) and three convolutional neural network (CNN) architectures (e.g., VGG, ResNet, and Inception) for two representative physiological signal prediction tasks: the classification for predicting hypotension and the regression for predicting cardiac output (CO). We evaluated multiple combinations for performance and convergence of physiological signal deep learning model. Accordingly, the CNN models with the spatial attention mechanism showed the best performance in the classification problem, whereas the channel attention mechanism achieved the lowest error in the regression problem. Moreover, the performance and convergence of the CNN models with attention mechanisms were better than stand-alone self-attention models in both problems. Hence, we verified that convolutional operation and attention mechanisms are complementary and provide faster convergence time, despite the stand-alone self-attention models requiring fewer parameters.
39.9SEMar 16
Code Sharing In Prediction Model Research: A Scoping ReviewThomas Sounack, Raffaele Giancotti, Catherine A. Gao et al.
Analytical code is essential for reproducing diagnostic and prognostic prediction model research, yet code availability in the published literature remains limited. While the TRIPOD statements set standards for reporting prediction model methods, they do not define explicit standards for repository structure and documentation. This review quantifies current code-sharing practices to inform the development of TRIPOD-Code, a TRIPOD extension reporting guideline focused on code sharing. We conducted a scoping review of PubMed-indexed articles citing TRIPOD or TRIPOD+AI as of Aug 11, 2025, restricted to studies retrievable via the PubMed Central Open Access API. Eligible studies developed, updated, or validated multivariable prediction models. A large language model-assisted pipeline was developed to screen articles and extract code availability statements and repository links. Repositories were assessed with the same LLM against 14 predefined reproducibility-related features. Our code is made publicly available. Among 3,967 eligible articles, 12.2% included code sharing statements. Code sharing increased over time, reaching 15.8% in 2025, and was higher among TRIPOD+AI-citing studies than TRIPOD-citing studies. Sharing prevalence varied widely by journal and country. Repository assessment showed substantial heterogeneity in reproducibility features: most repositories contained a README file (80.5%), but fewer specified dependencies (37.6%; version-constrained 21.6%) or were modular (42.4%). In prediction model research, code sharing remains relatively uncommon, and when shared, often falls short of being reusable. These findings provide an empirical baseline for the TRIPOD-Code extension and underscore the need for clearer expectations beyond code availability, including documentation, dependency specification, licensing, and executable structure.
79.9CLApr 21
Coding-Free and Privacy-Preserving Agentic Framework for Data-Driven Clinical ResearchTaehun Kim, Hyeryun Park, Hyeonhoon Lee et al.
Clinical data-driven research requires clinical expertise, programming skills, access to patient data, and extensive documentation, creating barriers and slowing the pace for clinicians and external researchers. To address this, we developed the Clinical Agentic Research Intelligence System (CARIS) that automates the workflow: research planning, literature search, cohort construction, Institutional Review Board (IRB) documentation, Vibe Machine Learning (ML), and report generation, with human-in-the-loop refinement. CARIS integrates Large Language Models (LLMs) with modular tools through the Model Context Protocol (MCP), enabling natural language-driven research without coding while allowing users to access only outputs. We evaluated CARIS on three heterogeneous datasets with distinct clinical tasks, where it completed planning and IRB documentation within four iterations, supported Vibe ML, and generated reports, achieving 96% completeness in LLM-based evaluation and 82% in human evaluation. CARIS demonstrates potential to reduce documentation burden and technical barriers, accelerating data-driven clinical research across public and private data environments.
AIMar 4, 2025Code
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram ReportsLama Moukheiber, Mira Moukheiber, Dana Moukheiiber et al.
We introduce a novel question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care database. This dataset is specifically designed to enhance QA systems in cardiology, consisting of 771,244 QA pairs addressing a wide array of cardiac abnormalities and their severity. We compare large language models (LLMs), including open-source and biomedical-specific models for zero-shot evaluation, and closed-source models for zero-shot and three-shot evaluation. Our results show that fine-tuning LLMs improves performance across various QA metrics, validating the value of our dataset. Clinicians also qualitatively evaluate the best-performing model to assess the LLM responses for correctness. Further, we conduct fine-grained fairness audits to assess the bias-performance trade-off of LLMs across various social determinants of health. Our objective is to propel the field forward by establishing a benchmark for LLM AI agents aimed at supporting clinicians with cardiac differential diagnoses, thereby reducing the documentation burden that contributes to clinician burnout and enabling healthcare professionals to focus more on patient care.
LGSep 1, 2021Code
Developing and validating multi-modal models for mortality prediction in COVID-19 patients: a multi-center retrospective studyJoy Tzung-yu Wu, Miguel Ángel Armengol de la Hoz, Po-Chih Kuo et al.
The unprecedented global crisis brought about by the COVID-19 pandemic has sparked numerous efforts to create predictive models for the detection and prognostication of SARS-CoV-2 infections with the goal of helping health systems allocate resources. Machine learning models, in particular, hold promise for their ability to leverage patient clinical information and medical images for prediction. However, most of the published COVID-19 prediction models thus far have little clinical utility due to methodological flaws and lack of appropriate validation. In this paper, we describe our methodology to develop and validate multi-modal models for COVID-19 mortality prediction using multi-center patient data. The models for COVID-19 mortality prediction were developed using retrospective data from Madrid, Spain (N=2547) and were externally validated in patient cohorts from a community hospital in New Jersey, USA (N=242) and an academic center in Seoul, Republic of Korea (N=336). The models we developed performed differently across various clinical settings, underscoring the need for a guided strategy when employing machine learning for clinical decision-making. We demonstrated that using features from both the structured electronic health records and chest X-ray imaging data resulted in better 30-day-mortality prediction performance across all three datasets (areas under the receiver operating characteristic curves: 0.85 (95% confidence interval: 0.83-0.87), 0.76 (0.70-0.82), and 0.95 (0.92-0.98)). We discuss the rationale for the decisions made at every step in developing the models and have made our code available to the research community. We employed the best machine learning practices for clinical model development. Our goal is to create a toolkit that would assist investigators and organizations in building multi-modal models for prediction, classification and/or optimization.
CYFeb 23, 2025
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of HealthMira Moukheiber, Lama Moukheiber, Dana Moukheiber et al.
In critical care settings, where precise and timely interventions are crucial for health outcomes, evaluating disparities in patient outcomes is essential. Current approaches often fail to fully capture the impact of respiratory support interventions on individuals affected by social determinants of health. While attributes such as gender, race, and age are commonly assessed and provide valuable insights, they offer only a partial view of the complexities faced by diverse populations. In this study, we focus on two clinically motivated tasks: prolonged mechanical ventilation and successful weaning. Additionally, we conduct fairness audits on the models' predictions across demographic groups and social determinants of health to better understand health inequities in respiratory interventions within the intensive care unit. Furthermore, we release a temporal benchmark dataset, verified by clinical experts, to facilitate benchmarking of clinical respiratory intervention tasks.
AIOct 22, 2025
A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT ChecklistSohyeon Jeon, Hyung-Chul Lee
Despite the rapid expansion of Large Language Models (LLMs) in healthcare, robust and explainable evaluation of their ability to assess clinical trial reporting according to CONSORT standards remains an open challenge. In particular, uncertainty calibration and metacognitive reliability of LLM reasoning are poorly understood and underexplored in medical automation. This study applies a behavioral and metacognitive analytic approach using an expert-validated dataset, systematically comparing two representative LLMs - one general and one domain-specialized - across three prompt strategies. We analyze both cognitive adaptation and calibration error using metrics: Expected Calibration Error (ECE) and a baseline-normalized Relative Calibration Error (RCE) that enables reliable cross-model comparison. Our results reveal pronounced miscalibration and overconfidence in both models, especially under clinical role-playing conditions, with calibration error persisting above clinically relevant thresholds. These findings underscore the need for improved calibration, transparent code, and strategic prompt engineering to develop reliable and explainable medical AI.
SPAug 26, 2025
A Masked Representation Learning to Model Cardiac Functions Using Multiple Physiological SignalsSeong-A Park, Jong-Eui Chae, Sungdong Kim et al.
In clinical settings, monitoring hemodynamics is crucial for managing patient prognosis, necessitating the integrated analysis of multiple physiological signals. While recent research has analyzed single signals such as electrocardiography (ECG) or photoplethysmography (PPG), there has yet to be a proposal for an approach that encompasses the complex signal analysis required in actual clinical scenarios. In this study, we introduce the SNUPHY-M (Seoul National University hospital PHYsiological signal Masked representation learning) model extracts physiological features reflecting the electrical, pressure, and fluid characteristics of the cardiac cycle in the process of restoring three masked physiological signals based on self-supervised learning (SSL): ECG, PPG, and arterial blood pressure (ABP) signals. By employing multiple physical characteristics, the model can extract more enriched features only using non-invasive signals. We evaluated the model's performance in clinical downstream tasks such as hypotension, stroke volume, systolic blood pressure, diastolic blood pressure, and age prediction. Our results showed that the SNUPHY-M significantly outperformed supervised or SSL models, especially in prediction tasks using non-invasive signals. To the best of our knowledge, SNUPHY-M is the first model to apply multi-modal SSL to cardiovascular analysis involving ECG, PPG, and ABP signals. This approach effectively supports clinical decision-making and enables precise diagnostics, contributing significantly to the early diagnosis and management of hemodynamics without invasiveness.