CVAIDec 28, 2022

Swin MAE: Masked Autoencoders for Small Datasets

arXiv:2212.13805v247 citationsh-index: 43Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of limited labeled data in medical image analysis, though it is an incremental adaptation of existing methods to a specific domain.

The paper tackles the problem of applying unsupervised learning to small medical image datasets by proposing Swin MAE, a masked autoencoder with a Swin Transformer backbone. The method achieves transfer learning results that equal or slightly outperform supervised models trained on ImageNet, even with only a few thousand images and no pre-trained models.

The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most of the current unsupervised learning methods need to be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, which is a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images and without using any pre-trained models, Swin MAE is still able to learn useful semantic features purely from images. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in terms of the transfer learning results of downstream tasks. The code is publicly available at https://github.com/Zian-Xu/Swin-MAE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes