CLAug 27, 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical RecordsScott L. Fleming, Alejandro Lozano, William J. Haberkorn et al. · stanford
The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.
IVJul 29, 2025Code
ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text ReportsMohammed Baharoon, Luyang Luo, Michael Moritz et al.
We introduce ReXGroundingCT, the first publicly available dataset linking free-text findings to pixel-level 3D segmentations in chest CT scans. The dataset includes 3,142 non-contrast chest CT scans paired with standardized radiology reports from CT-RATE. Construction followed a structured three-stage pipeline. First, GPT-4 was used to extract and standardize findings, descriptors, and metadata from reports originally written in Turkish and machine-translated into English. Second, GPT-4o-mini categorized each finding into a hierarchical ontology of lung and pleural abnormalities. Third, 3D annotations were produced for all CT volumes: the training set was quality-assured by board-certified radiologists, and the validation and test sets were fully annotated by board-certified radiologists. Additionally, a complementary chain-of-thought dataset was created to provide step-by-step hierarchical anatomical reasoning for localizing findings within the CT volume, using GPT-4o and localization coordinates derived from organ segmentation models. ReXGroundingCT contains 16,301 annotated entities across 8,028 text-to-3D-segmentation pairs, covering diverse radiological patterns from 3,142 non-contrast CT scans. About 79% of findings are focal abnormalities and 21% are non-focal. The dataset includes a public validation set of 50 cases and a private test set of 100 cases, both annotated by board-certified radiologists. The dataset establishes a foundation for enabling free-text finding segmentation and grounded radiology report generation in CT imaging. Model performance on the private test set is hosted on a public leaderboard at https://rexrank.ai/ReXGroundingCT. The dataset is available at https://huggingface.co/datasets/rajpurkarlab/ReXGroundingCT.
IVAug 23, 2025
Generating Synthetic Contrast-Enhanced Chest CT Images from Non-Contrast Scans Using Slice-Consistent Brownian Bridge Diffusion NetworkPouya Shiri, Xin Yi, Neel P. Mistry et al.
Contrast-enhanced computed tomography (CT) imaging is essential for diagnosing and monitoring thoracic diseases, including aortic pathologies. However, contrast agents pose risks such as nephrotoxicity and allergic-like reactions. The ability to generate high-fidelity synthetic contrast-enhanced CT angiography (CTA) images without contrast administration would be transformative, enhancing patient safety and accessibility while reducing healthcare costs. In this study, we propose the first bridge diffusion-based solution for synthesizing contrast-enhanced CTA images from non-contrast CT scans. Our approach builds on the Slice-Consistent Brownian Bridge Diffusion Model (SC-BBDM), leveraging its ability to model complex mappings while maintaining consistency across slices. Unlike conventional slice-wise synthesis methods, our framework preserves full 3D anatomical integrity while operating in a high-resolution 2D fashion, allowing seamless volumetric interpretation under a low memory budget. To ensure robust spatial alignment, we implement a comprehensive preprocessing pipeline that includes resampling, registration using the Symmetric Normalization method, and a sophisticated dilated segmentation mask to extract the aorta and surrounding structures. We create two datasets from the Coltea-Lung dataset: one containing only the aorta and another including both the aorta and heart, enabling a detailed analysis of anatomical context. We compare our approach against baseline methods on both datasets, demonstrating its effectiveness in preserving vascular structures while enhancing contrast fidelity.
CVNov 14, 2020
Automatic classification of multiple catheters in neonatal radiographs with deep learningRobert D. E. Henderson, Xin Yi, Scott J. Adams et al.
We develop and evaluate a deep learning algorithm to classify multiple catheters on neonatal chest and abdominal radiographs. A convolutional neural network (CNN) was trained using a dataset of 777 neonatal chest and abdominal radiographs, with a split of 81%-9%-10% for training-validation-testing, respectively. We employed ResNet-50 (a CNN), pre-trained on ImageNet. Ground truth labelling was limited to tagging each image to indicate the presence or absence of endotracheal tubes (ETTs), nasogastric tubes (NGTs), and umbilical arterial and venous catheters (UACs, UVCs). The data set included 561 images containing 2 or more catheters, 167 images with only one, and 49 with none. Performance was measured with average precision (AP), calculated from the area under the precision-recall curve. On our test data, the algorithm achieved an overall AP (95% confidence interval) of 0.977 (0.679-0.999) for NGTs, 0.989 (0.751-1.000) for ETTs, 0.979 (0.873-0.997) for UACs, and 0.937 (0.785-0.984) for UVCs. Performance was similar for the set of 58 test images consisting of 2 or more catheters, with an AP of 0.975 (0.255-1.000) for NGTs, 0.997 (0.009-1.000) for ETTs, 0.981 (0.797-0.998) for UACs, and 0.937 (0.689-0.990) for UVCs. Our network thus achieves strong performance in the simultaneous detection of these four catheter types. Radiologists may use such an algorithm as a time-saving mechanism to automate reporting of catheters on radiographs.
IVFeb 9, 2020
Computer-Aided Assessment of Catheters and Tubes on Radiographs: How Good is Artificial Intelligence for Assessment?Xin Yi, Scott J. Adams, Robert D. E. Henderson et al.
Catheters are the second most common abnormal finding on radiographs. The position of catheters must be assessed on all radiographs, as serious complications can arise if catheters are malpositioned. However, due to the large number of radiographs performed each day, there can be substantial delays between the time a radiograph is performed and when it is interpreted by a radiologist. Computer-aided approaches hold the potential to assist in prioritizing radiographs with potentially malpositioned catheters for interpretation and automatically insert text indicating the placement of catheters in radiology reports, thereby improving radiologists' efficiency. After 50 years of research in computer-aided diagnosis, there is still a paucity of study in this area. With the development of deep learning approaches, the problem of catheter assessment is far more solvable. Therefore, we have performed a review of current algorithms and identified key challenges in building a reliable computer-aided diagnosis system for assessment of catheters on radiographs. This review may serve to further the development of machine learning approaches for this important use case.