CV AINov 5, 2025

Generative deep learning for foundational video translation in ultrasound

Nikolina Tomic Roshni Bhatnagar, Sarthak Jain, Connor Lau, Tien-Yu Liu, Laura Gambini, Rima Arnaout

arXiv:2511.03255v13.6h-index: 3

Originality Synthesis-oriented

AI Analysis

This addresses data imbalance problems in medical imaging for ultrasound researchers and clinicians, though it appears incremental as an application of existing generative techniques to a specific medical domain.

The researchers tackled the challenge of imbalanced ultrasound sub-modalities by developing a generative method for translating color flow doppler to greyscale videos, achieving high similarity scores (SSIM 0.91+/-0.04) and synthetic videos that performed indistinguishably from real ones in classification and segmentation tasks.

Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medicine, however, attention to data imbalance and missingness is required. Ultrasound data presents a particular challenge because in addition to different views and structures, it includes several sub-modalities-such as greyscale and color flow doppler (CFD)-that are often imbalanced in clinical studies. Image translation can help balance datasets but is challenging for ultrasound sub-modalities to date. Here, we present a generative method for ultrasound CFD-greyscale video translation, trained on 54,975 videos and tested on 8,368. The method developed leveraged pixel-wise, adversarial, and perceptual loses and utilized two networks: one for reconstructing anatomic structures and one for denoising to achieve realistic ultrasound imaging. Average pairwise SSIM between synthetic videos and ground truth was 0.91+/-0.04. Synthetic videos performed indistinguishably from real ones in DL classification and segmentation tasks and when evaluated by blinded clinical experts: F1 score was 0.9 for real and 0.89 for synthetic videos; Dice score between real and synthetic segmentation was 0.97. Overall clinician accuracy in distinguishing real vs synthetic videos was 54+/-6% (42-61%), indicating realistic synthetic videos. Although trained only on heart videos, the model worked well on ultrasound spanning several clinical domains (average SSIM 0.91+/-0.05), demonstrating foundational abilities. Together, these data expand the utility of retrospectively collected imaging and augment the dataset design toolbox for medical imaging.

View on arXiv PDF

Similar