A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
This work addresses the need for a unified model in neurological research and diagnosis to simplify deployment in real-world scenarios, though it is incremental as it builds on existing universal model concepts.
The authors tackled the problem of brain lesion segmentation across multiple lesion types and imaging modalities by proposing a universal foundation model with a Mixture of Modality Experts framework, achieving state-of-the-art performance on nine datasets and promising generalization to unseen data.
Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.