CVApr 3, 2024

Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

arXiv:2404.02845v230 citationsh-index: 34Has CodeIEEE Transactions on Medical Imaging
AI Analysis

This addresses inconsistent segmentation in medical imaging for clinicians, though it is incremental as it builds on prior language-guided methods.

The paper tackles the problem of language-guided medical image segmentation by proposing a cross-modal conditioned reconstruction method to explicitly align visual features and medical notes, achieving a 3.74% mIoU improvement over LViT on MosMedData+ and reducing parameters by 20.2% and computation by 55.5%.

Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit and ambiguous architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the publicly available MosMedData+ dataset and achieving an average increase of 1.89% mIoU for cross-domain tests on our QATA-CoV19 dataset. Simultaneously, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at https://github.com/ShashankHuang/RecLMIS.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes