CVSep 21, 2025

CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner

arXiv:2509.17065v1h-index: 5Has CodeMICCAI
Originality Incremental advance
AI Analysis

This work addresses the need for accurate cardiac assessment in clinical settings with limited annotated data, though it is incremental as it adapts existing CLIP models for echocardiography.

The paper tackled the problem of left ventricular ejection fraction (LVEF) prediction from echocardiography videos by proposing CardiacCLIP, a video-based framework with attention-based frame aggregation and multi-resolution scaling, which reduced MAE by 2.07 on the EchoNet-Dynamic dataset in a 1-shot setting.

Echocardiography is a vital non-invasive modality for cardiac assessment, with left ventricular ejection fraction (LVEF) serving as a key indicator of heart function. Existing LVEF estimation methods depend on large-scale annotated video datasets, which are costly and limit adaptability across various clinical settings. Recent vision-language models for echocardiography, such as EchoCLIP, apply image-to-text pretraining but fail to capture crucial temporal dynamics and localized cardiac structures essential for accurate diagnosis. To address these challenges, we propose CardiacCLIP, a video-based framework that enhances LVEF prediction through attention-based frame aggregation and multi-resolution input scaling. Specifically, we introduce MFL (Multi Frame Learning), a novel attention-based mechanism for selectively fusing informative frames, and EchoZoom, a multi-scale feature extraction strategy that refines spatial representations of cardiac structures. As a novel adaptation of CLIP models for few-shot echocardiogram video analysis, our approach significantly improves diagnostic accuracy, reducing MAE by 2.07 on the EchoNet-Dynamic dataset under 1-shot setting. The code is available at https://github.com/xmed-lab/CardiacCLIP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes