AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
This addresses the problem of limited labeled data and inefficient self-supervised learning for medical imaging researchers, offering an incremental improvement over existing methods.
The paper tackles the challenge of inefficient pretraining in masked image modeling for 3D medical image segmentation by proposing AnatoMask, which dynamically masks anatomically significant regions based on reconstruction loss, resulting in superior performance and scalability on 4 public datasets with CT, MRI, and PET modalities.
Due to the scarcity of labeled data, self-supervised learning (SSL) has gained much attention in 3D medical image segmentation, by extracting semantic representations from unlabeled data. Among SSL strategies, Masked image modeling (MIM) has shown effectiveness by reconstructing randomly masked images to learn detailed representations. However, conventional MIM methods require extensive training data to achieve good performance, which still poses a challenge for medical imaging. Since random masking uniformly samples all regions within medical images, it may overlook crucial anatomical regions and thus degrade the pretraining efficiency. We propose AnatoMask, a novel MIM method that leverages reconstruction loss to dynamically identify and mask out anatomically significant regions to improve pretraining efficacy. AnatoMask takes a self-distillation approach, where the model learns both how to find more significant regions to mask and how to reconstruct these masked regions. To avoid suboptimal learning, Anatomask adjusts the pretraining difficulty progressively using a masking dynamics function. We have evaluated our method on 4 public datasets with multiple imaging modalities (CT, MRI, and PET). AnatoMask demonstrates superior performance and scalability compared to existing SSL methods. The code is available at https://github.com/ricklisz/AnatoMask.