CVAug 7, 2025

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation

Xusheng Liang, Lihua Zhou, Nianxin Li, Miao Xu, Ziyang Song, Dong Yi, Jinlin Wu, Hongbin Liu, Jiebo Luo, Zhen Lei

arXiv:2508.05008v12 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses domain generalization in medical imaging, which is crucial for reliable segmentation across varied clinical settings, though it is an incremental improvement combining existing techniques.

The paper tackles the problem of poor generalization in medical image segmentation across different domains by proposing Multimodal Causal-Driven Representation Learning (MCDRL), which integrates causal inference with vision-language models to eliminate domain-specific variations while preserving anatomical information, achieving superior segmentation accuracy and robust generalizability in experiments.

Vision-Language Models (VLMs), such as CLIP, have demonstrated remarkable zero-shot capabilities in various computer vision tasks. However, their application to medical imaging remains challenging due to the high variability and complexity of medical data. Specifically, medical images often exhibit significant domain shifts caused by various confounders, including equipment differences, procedure artifacts, and imaging modes, which can lead to poor generalization when models are applied to unseen domains. To address this limitation, we propose Multimodal Causal-Driven Representation Learning (MCDRL), a novel framework that integrates causal inference with the VLM to tackle domain generalization in medical image segmentation. MCDRL is implemented in two steps: first, it leverages CLIP's cross-modal capabilities to identify candidate lesion regions and construct a confounder dictionary through text prompts, specifically designed to represent domain-specific variations; second, it trains a causal intervention network that utilizes this dictionary to identify and eliminate the influence of these domain-specific variations while preserving the anatomical structural information critical for segmentation tasks. Extensive experiments demonstrate that MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.

View on arXiv PDF

Similar