VidFuncta: Towards Generalizable Neural Representations for Ultrasound Videos
This work addresses the need for generalizable and efficient video analysis in medical imaging, specifically for ultrasound, offering a novel approach that could improve clinical diagnostics, though it appears incremental as it extends an existing method to a new domain.
The paper tackled the problem of analyzing ultrasound videos, which is challenging due to non-standardized acquisition and operator bias, by proposing VidFuncta, a framework using implicit neural representations to encode videos into compact, time-resolved representations; it outperformed 2D and 3D baselines on reconstruction and enabled downstream tasks like ejection fraction prediction, B-line detection, and breast lesion classification across three datasets.
Ultrasound is widely used in clinical care, yet standard deep learning methods often struggle with full video analysis due to non-standardized acquisition and operator bias. We offer a new perspective on ultrasound video analysis through implicit neural representations (INRs). We build on Functa, an INR framework in which each image is represented by a modulation vector that conditions a shared neural network. However, its extension to the temporal domain of medical videos remains unexplored. To address this gap, we propose VidFuncta, a novel framework that leverages Functa to encode variable-length ultrasound videos into compact, time-resolved representations. VidFuncta disentangles each video into a static video-specific vector and a sequence of time-dependent modulation vectors, capturing both temporal dynamics and dataset-level redundancies. Our method outperforms 2D and 3D baselines on video reconstruction and enables downstream tasks to directly operate on the learned 1D modulation vectors. We validate VidFuncta on three public ultrasound video datasets -- cardiac, lung, and breast -- and evaluate its downstream performance on ejection fraction prediction, B-line detection, and breast lesion classification. These results highlight the potential of VidFuncta as a generalizable and efficient representation framework for ultrasound videos. Our code is publicly available under https://github.com/JuliaWolleb/VidFuncta_public.