MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning
This work addresses the challenge of efficient pre-training for 3D medical imaging tasks, offering a practical solution for researchers and practitioners in medical AI, though it is incremental as it builds on existing self-supervised learning frameworks.
The paper tackled the problem of training deep learning models for 3D medical imaging like CT scans, which suffer from labeled data scarcity and domain shift from natural images, by proposing MAESIL, a self-supervised learning framework that uses 3D superpatches and a masked autoencoder to capture structural information, resulting in significant improvements in reconstruction metrics such as PSNR and SSIM over existing methods.
Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the 'superpatch', a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dual-masking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.