CVJul 13, 2021

Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification

arXiv:2107.05940v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of selecting appropriate source datasets for medical image classification, which is incremental as it builds on existing transfer learning practices by providing empirical evidence and challenging prior assumptions.

The study systematically investigates the impact of source dataset choice in transfer learning for 2D medical image classification, finding that ImageNet yields the highest performance but larger datasets are not always better, and common intuitions about data similarity are inaccurate for predicting optimal sources.

Transfer learning is a commonly used strategy for medical image classification, especially via pretraining on source data and fine-tuning on target data. There is currently no consensus on how to choose appropriate source data, and in the literature we can find both evidence of favoring large natural image datasets such as ImageNet, and evidence of favoring more specialized medical datasets. In this paper we perform a systematic study with nine source datasets with natural or medical images, and three target medical datasets, all with 2D images. We find that ImageNet is the source leading to the highest performances, but also that larger datasets are not necessarily better. We also study different definitions of data similarity. We show that common intuitions about similarity may be inaccurate, and therefore not sufficient to predict an appropriate source a priori. Finally, we discuss several steps needed for further research in this field, especially with regard to other types (for example 3D) medical images. Our experiments and pretrained models are available via \url{https://www.github.com/vcheplygina/cats-scans}

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes