CVMar 13
Opportunistic Cardiac Health Assessment: Estimating Phenotypes from Localizer MRI through Multi-Modal RepresentationsBusra Nur Zeybek, Özgün Turgut, Yundi Zhang et al.
Cardiovascular diseases are the leading cause of death. Cardiac phenotypes (CPs), e.g., ejection fraction, are the gold standard for assessing cardiac health, but they are derived from cine cardiac magnetic resonance imaging (CMR), which is costly and requires high spatio-temporal resolution. Every magnetic resonance (MR) examination begins with rapid and coarse localizers for scan planning, which are discarded thereafter. Despite non-diagnostic image quality and lack of temporal information, localizers can provide valuable structural information rapidly. In addition to imaging, patient-level information, including demographics and lifestyle, influence the cardiac health assessment. Electrocardiograms (ECGs) are inexpensive, routinely ordered in clinical practice, and capture the temporal activity of the heart. Here, we introduce C-TRIP (Cardiac Tri-modal Representations for Imaging Phenotypes), a multi-modal framework that aligns localizer MRI, ECG signals, and tabular metadata to learn a robust latent space and predict CPs using localizer images as an opportunistic alternative to CMR. By combining these three modalities, we leverage cheap spatial and temporal information from localizers, and ECG, respectively while benefiting from patient-specific context provided by tabular data. Our pipeline consists of three stages. First, encoders are trained independently to learn uni-modal representations. The second stage fuses the pre-trained encoders to unify the latent space. The final stage uses the enriched representation space for CP prediction, with inference performed exclusively on localizer MRI. Proposed C-TRIP yields accurate functional CPs, and high correlations for structural CPs. Since localizers are inherently rapid and low-cost, our C-TRIP framework could enable better accessibility for CP estimation.
CVMar 10
No Image, No Problem: End-to-End Multi-Task Cardiac Analysis from Undersampled k-SpaceYundi Zhang, Sevgi Gokce Kafali, Niklas Bubeck et al.
Conventional clinical CMR pipelines rely on a sequential "reconstruct-then-analyze" paradigm, forcing an ill-posed intermediate step that introduces avoidable artifacts and information bottlenecks. This creates a fundamental mathematical paradox: it attempts to recover high-dimensional pixel arrays (i.e., images) from undersampled k-space, rather than directly extracting the low-dimensional physiological labels actually required for diagnosis. To unlock the direct diagnostic potential of k-space, we propose k-MTR (k-space Multi-Task Representation), a k-space representation learning framework that aligns undersampled k-space data and fully-sampled images into a shared semantic manifold. Leveraging a large-scale controlled simulation of 42,000 subjects, k-MTR forces the k-space encoder to restore anatomical information lost to undersampling directly within the latent space, bypassing the explicit inverse problem for downstream analysis. We demonstrate that this latent alignment enables the dense latent space embedded with high-level physiological semantics directly from undersampled frequencies. Across continuous phenotype regression, disease classification, and anatomical segmentation, k-MTR achieves highly competitive performance against state-of-the-art image-domain baselines. By showcasing that precise spatial geometries and multi-task features can be successfully recovered directly from the k-space representations, k-MTR provides a robust architectural blueprint for task-aware cardiac MRI workflows.
LGJul 30, 2025Code
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language ModelsJiazhen Pan, Bailiang Jian, Paul Hager et al.
Ensuring the safety and reliability of large language models (LLMs) in clinical practice is critical to prevent patient harm and promote trustworthy healthcare applications of AI. However, LLMs are advancing so rapidly that static safety benchmarks often become obsolete upon publication, yielding only an incomplete and sometimes misleading picture of model trustworthiness. We demonstrate that a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that continuously stress-tests LLMs can reveal significant weaknesses of current LLMs across four safety-critical domains: robustness, privacy, bias/fairness, and hallucination. A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses, uncovering vulnerabilities in real time without human intervention. Applying DAS to 15 proprietary and open-source LLMs revealed a stark contrast between static benchmark performance and vulnerability under adversarial pressure. Despite a median MedQA accuracy exceeding 80\%, 94\% of previously correct answers failed our dynamic robustness tests. We observed similarly high failure rates across other domains: privacy leaks were elicited in 86\% of scenarios, cognitive-bias priming altered clinical recommendations in 81\% of fairness tests, and we identified hallucination rates exceeding 66\% in widely used models. Such profound residual risks are incompatible with routine clinical practice. By converting red-teaming from a static checklist into a dynamic stress-test audit, DAS red-teaming offers the surveillance that hospitals/regulators/technology vendors require as LLMs become embedded in patient chatbots, decision-support dashboards, and broader healthcare workflows. Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
CVAug 4, 2025Code
Whole-body Representation Learning For Competing Preclinical Disease Risk AssessmentDmitrii Seletkov, Sophie Starck, Ayhan Can Erdur et al.
Reliable preclinical disease risk assessment is essential to move public healthcare from reactive treatment to proactive identification and prevention. However, image-based risk prediction algorithms often consider one condition at a time and depend on hand-crafted features obtained through segmentation tools. We propose a whole-body self-supervised representation learning method for the preclinical disease risk assessment under a competing risk modeling. This approach outperforms whole-body radiomics in multiple diseases, including cardiovascular disease (CVD), type 2 diabetes (T2D), chronic obstructive pulmonary disease (COPD), and chronic kidney disease (CKD). Simulating a preclinical screening scenario and subsequently combining with cardiac MRI, it sharpens further the prediction for CVD subgroups: ischemic heart disease (IHD), hypertensive diseases (HD), and stroke. The results indicate the translational potential of whole-body representations as a standalone screening modality and as part of a multi-modal framework within clinical workflows for early personalized risk stratification. The code is available at https://github.com/yayapa/WBRLforCR/
CVSep 8, 2025
Does DINOv3 Set a New Medical Vision Standard?Che Liu, Yinda Chen, Haoyuan Shi et al.
The advent of large-scale vision foundation models, pre-trained on diverse natural images, has marked a paradigm shift in computer vision. However, how the frontier vision foundation models' efficacies transfer to specialized domains remains such as medical imaging remains an open question. This report investigates whether DINOv3, a state-of-the-art self-supervised vision transformer (ViT) that features strong capability in dense prediction tasks, can directly serve as a powerful, unified encoder for medical vision tasks without domain-specific pre-training. To answer this, we benchmark DINOv3 across common medical vision tasks, including 2D/3D classification and segmentation on a wide range of medical imaging modalities. We systematically analyze its scalability by varying model sizes and input image resolutions. Our findings reveal that DINOv3 shows impressive performance and establishes a formidable new baseline. Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks, despite being trained solely on natural images. However, we identify clear limitations: The model's features degrade in scenarios requiring deep domain specialization, such as in Whole-Slide Pathological Images (WSIs), Electron Microscopy (EM), and Positron Emission Tomography (PET). Furthermore, we observe that DINOv3 does not consistently obey scaling law in the medical domain; performance does not reliably increase with larger models or finer feature resolutions, showing diverse scaling behaviors across tasks. Ultimately, our work establishes DINOv3 as a strong baseline, whose powerful visual features can serve as a robust prior for multiple complex medical tasks. This opens promising future directions, such as leveraging its features to enforce multiview consistency in 3D reconstruction.
IVApr 17, 2025
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and BeyondYundi Zhang, Paul Hager, Che Liu et al.
Cardiac magnetic resonance imaging is the gold standard for non-invasive cardiac assessment, offering rich spatio-temporal views of the cardiac anatomy and physiology. Patient-level health factors, such as demographics, metabolic, and lifestyle, are known to substantially influence cardiovascular health and disease risk, yet remain uncaptured by CMR alone. To holistically understand cardiac health and to enable the best possible interpretation of an individual's disease risk, CMR and patient-level factors must be jointly exploited within an integrated framework. Recent multi-modal approaches have begun to bridge this gap, yet they often rely on limited spatio-temporal data and focus on isolated clinical tasks, thereby hindering the development of a comprehensive representation for cardiac health evaluation. To overcome these limitations, we introduce ViTa, a step toward foundation models that delivers a comprehensive representation of the heart and a precise interpretation of individual disease risk. Leveraging data from 42,000 UK Biobank participants, ViTa integrates 3D+T cine stacks from short-axis and long-axis views, enabling a complete capture of the cardiac cycle. These imaging data are then fused with detailed tabular patient-level factors, enabling context-aware insights. This multi-modal paradigm supports a wide spectrum of downstream tasks, including cardiac phenotype and physiological feature prediction, segmentation, and classification of cardiac and metabolic diseases within a single unified framework. By learning a shared latent representation that bridges rich imaging features and patient context, ViTa moves beyond traditional, task-specific models toward a universal, patient-specific understanding of cardiac health, highlighting its potential to advance clinical utility and scalability in cardiac analysis.
IVJul 25, 2025
Reconstruct or Generate: Exploring the Spectrum of Generative Modeling for Cardiac MRINiklas Bubeck, Yundi Zhang, Suprosanna Shit et al.
In medical imaging, generative models are increasingly relied upon for two distinct but equally critical tasks: reconstruction, where the goal is to restore medical imaging (usually inverse problems like inpainting or superresolution), and generation, where synthetic data is created to augment datasets or carry out counterfactual analysis. Despite shared architecture and learning frameworks, they prioritize different goals: generation seeks high perceptual quality and diversity, while reconstruction focuses on data fidelity and faithfulness. In this work, we introduce a "generative model zoo" and systematically analyze how modern latent diffusion models and autoregressive models navigate the reconstruction-generation spectrum. We benchmark a suite of generative models across representative cardiac medical imaging tasks, focusing on image inpainting with varying masking ratios and sampling strategies, as well as unconditional image generation. Our findings show that diffusion models offer superior perceptual quality for unconditional generation but tend to hallucinate as masking ratios increase, whereas autoregressive models maintain stable perceptual performance across masking levels, albeit with generally lower fidelity.
IVJun 1, 2024
Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR ImagesYundi Zhang, Chen Chen, Suprosanna Shit et al.
Cardiac Magnetic Resonance (CMR) imaging serves as the gold-standard for evaluating cardiac morphology and function. Typically, a multi-view CMR stack, covering short-axis (SA) and 2/3/4-chamber long-axis (LA) views, is acquired for a thorough cardiac assessment. However, efficiently streamlining the complex, high-dimensional 3D+T CMR data and distilling compact, coherent representation remains a challenge. In this work, we introduce a whole-heart self-supervised learning framework that utilizes masked imaging modeling to automatically uncover the correlations between spatial and temporal patches throughout the cardiac stacks. This process facilitates the generation of meaningful and well-clustered heart representations without relying on the traditionally required, and often costly, labeled data. The learned heart representation can be directly used for various downstream tasks. Furthermore, our method demonstrates remarkable robustness, ensuring consistent representations even when certain CMR planes are missing/flawed. We train our model on 14,000 unlabeled CMR data from UK BioBank and evaluate it on 1,000 annotated data. The proposed method demonstrates superior performance to baselines in tasks that demand comprehensive 3D+T cardiac information, e.g. cardiac phenotype (ejection fraction and ventricle volume) prediction and multi-plane/multi-frame CMR segmentation, highlighting its effectiveness in extracting comprehensive cardiac features that are both anatomically and pathologically relevant.