CVApr 25, 2024

Multimodal Information Interaction for Medical Image Segmentation

arXiv:2404.16371v15 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of integrating relevant information between different modalities in medical image segmentation, with potential broader applications, though it appears incremental as it builds on existing multimodal fusion approaches.

The paper tackled the problem of effectively fusing multimodal features for medical image segmentation by introducing the Multimodal Information Cross Transformer (MicFormer), which improved whole-heart segmentation DICE score to 85.57 and MIoU to 75.51 on the MM-WHS dataset, outperforming other methods by margins of 2.83 and 4.23.

The use of multimodal data in assisted diagnosis and segmentation has emerged as a prominent area of interest in current research. However, one of the primary challenges is how to effectively fuse multimodal features. Most of the current approaches focus on the integration of multimodal features while ignoring the correlation and consistency between different modal features, leading to the inclusion of potentially irrelevant information. To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality. Leveraging the Cross Transformer, it queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features. Additionally, we incorporate a deformable Transformer architecture to expand the search space. We conducted experiments on the MM-WHS dataset, and in the CT-MRI multimodal image segmentation task, we successfully improved the whole-heart segmentation DICE score to 85.57 and MIoU to 75.51. Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively. This demonstrates the efficacy of MicFormer in integrating relevant information between different modalities in multimodal tasks. These findings hold significant implications for multimodal image tasks, and we believe that MicFormer possesses extensive potential for broader applications across various domains. Access to our method is available at https://github.com/fxxJuses/MICFormer

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes