IV CVJun 24, 2024

Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?

Pallabi Dutta, Soham Bose, Swalpa Kumar Roy, Sushmita Mitra

arXiv:2406.16993v310.312 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for efficient and high-performing segmentation models for medical imaging, particularly for deployment on resource-constrained systems, though it appears incremental as it builds on existing hybrid architectures.

The paper tackles the problem of high computational costs in medical 3D image segmentation by proposing U-VixLSTM, a hybrid model combining CNNs with Vision-xLSTM blocks, which achieves superior performance on Synapse, ISIC, and ACDC datasets compared to state-of-the-art networks.

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers (ViTs). There is an increasing focus on creating architectures that are both high-performing and computationally efficient, capable of being deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This research investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel U-VixLSTM. The Vision-xLSTM blocks capture the temporal and global relationships within the patches extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM

View on arXiv PDF Code

Similar