CVFeb 13, 2025

Latents of latents to delineate pixels: hybrid Matryoshka autoencoder-to-U-Net pairing for segmenting large medical images in GPU-poor and low-data regimes

arXiv:2502.08988v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of pixel-level segmentation for medical images with limited computational resources and data, though it appears incremental as it builds on existing U-Net and autoencoder methods.

The paper tackled the problem of segmenting high-resolution medical images efficiently in GPU-poor and low-data settings by proposing a hybrid Matryoshka autoencoder-to-U-Net architecture, achieving a Mean IoU of 77.68% and Dice Coefficient of 86.91% on left ventricle segmentation in echocardiographic images, outperforming a baseline U-Net.

Medical images are often high-resolution and lose important detail if downsampled, making pixel-level methods such as semantic segmentation much less efficient if performed on a low-dimensional image. We propose a low-rank Matryoshka projection and a hybrid segmenting architecture that preserves important information while retaining sufficient pixel geometry for pixel-level tasks. We design the Matryoshka Autoencoder (MatAE-U-Net) which combines the hierarchical encoding of the Matryoshka Autoencoder with the spatial reconstruction capabilities of a U-Net decoder, leveraging multi-scale feature extraction and skip connections to enhance accuracy and generalisation. We apply it to the problem of segmenting the left ventricle (LV) in echocardiographic images using the Stanford EchoNet-D dataset, including 1,000 standardised video-mask pairs of cardiac ultrasound videos resized to 112x112 pixels. The MatAE-UNet model achieves a Mean IoU of 77.68\%, Mean Pixel Accuracy of 97.46\%, and Dice Coefficient of 86.91\%, outperforming the baseline U-Net, which attains a Mean IoU of 74.70\%, Mean Pixel Accuracy of 97.31\%, and Dice Coefficient of 85.20\%. The results highlight the potential of using the U-Net in the recursive Matroshka latent space for imaging problems with low-contrast such as echocardiographic analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes