CVJul 2, 2021

Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation

Hadrien Reynaud, Athanasios Vlontzos, Benjamin Hou, Arian Beqiri, Paul Leeson, Bernhard Kainz

arXiv:2107.00977v115.978 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for automated, fast, and accurate cardiac diagnosis tools for clinicians, reducing reliance on manual processing, though it appears incremental as it adapts existing transformer architectures to a specific medical imaging task.

The paper tackles the problem of automating cardiac ejection fraction estimation from ultrasound videos, which typically requires manual expert analysis prone to variability, by proposing a transformer-based method that achieves an average frame distance of 3.36 for ES and 7.17 for ED detection, and estimates ejection fraction with a MAE of 5.95 and R² of 0.52 in 0.15 seconds per video.

Cardiac ultrasound imaging is used to diagnose various heart diseases. Common analysis pipelines involve manual processing of the video frames by expert clinicians. This suffers from intra- and inter-observer variability. We propose a novel approach to ultrasound video analysis using a transformer architecture based on a Residual Auto-Encoder Network and a BERT model adapted for token classification. This enables videos of any length to be processed. We apply our model to the task of End-Systolic (ES) and End-Diastolic (ED) frame detection and the automated computation of the left ventricular ejection fraction. We achieve an average frame distance of 3.36 frames for the ES and 7.17 frames for the ED on videos of arbitrary length. Our end-to-end learnable approach can estimate the ejection fraction with a MAE of 5.95 and $R^2$ of 0.52 in 0.15s per video, showing that segmentation is not the only way to predict ejection fraction. Code and models are available at https://github.com/HReynaud/UVT.

View on arXiv PDF Code

Similar