EchoLVFM: One-Step Video Generation via Latent Flow Matching for Echocardiogram Synthesis

Emmanuel Oladokun, Sarina Thomas, Jurica Šprem, Vicente Grau

arXiv:2603.1396755.6h-index: 3Has Code

Predicted impact top 21% in IV · last 90 daysOriginality Incremental advance

AI Analysis

This enables practical, controllable video synthesis for echocardiography, aiding data augmentation and training in cardiology, though it is incremental in applying flow matching to a specific domain.

The paper tackled the problem of generating realistic echocardiogram videos with control over clinical parameters like ejection fraction, achieving a ~50x improvement in sampling efficiency compared to multi-step baselines while maintaining visual fidelity and strong parameter adherence.

Echocardiography is widely used for assessing cardiac function, where clinically meaningful parameters such as left-ventricular ejection fraction (EF) play a central role in diagnosis and management. Generative models capable of synthesising realistic echocardiogram videos with explicit control over such parameters are valuable for data augmentation, counterfactual analysis, and specialist training. However, existing approaches typically rely on computationally expensive multi-step sampling and aggressive temporal normalisation, limiting efficiency and applicability to heterogeneous real-world data. We introduce EchoLVFM, a one-step latent video flow-matching framework for controllable echocardiogram generation. Operating in the latent space, EchoLVFM synthesises temporally coherent videos in a single inference step, achieving a $\mathbf{\sim 50\times}$ improvement in sampling efficiency compared to multi-step flow baselines while maintaining visual fidelity. The model supports global conditioning on clinical variables, demonstrated through precise control of EF, and enables reconstruction and counterfactual generation from partially observed sequences. A masked conditioning strategy further removes fixed-length constraints, allowing shorter sequences to be retained rather than discarded. We evaluate EchoLVFM on the CAMUS dataset under challenging single-frame conditioning. Quantitative and qualitative results demonstrate competitive video quality, strong EF adherence, and 57.9% discrimination accuracy by expert clinicians which is close to chance. These findings indicate that efficient, one-step flow matching can enable practical, controllable echocardiogram video synthesis without sacrificing fidelity. Code available at: https://github.com/EngEmmanuel/EchoLVFM

View on arXiv PDF Code

Similar