Foundation Models for Slide-level Cancer Subtyping in Digital Pathology
This work addresses the challenge of domain gaps in digital pathology for cancer diagnosis, but it is incremental as it compares existing methods rather than introducing new ones.
The paper tackled the problem of adapting computer vision models to digital pathology for cancer subtyping by comparing foundation models trained on histopathology data against ImageNet-pretrained models under a multiple instance learning framework, resulting in foundation models outperforming ImageNet-pretrained models for predicting six skin cancer subtypes.
Since the emergence of the ImageNet dataset, the pretraining and fine-tuning approach has become widely adopted in computer vision due to the ability of ImageNet-pretrained models to learn a wide variety of visual features. However, a significant challenge arises when adapting these models to domain-specific fields, such as digital pathology, due to substantial gaps between domains. To address this limitation, foundation models (FM) have been trained on large-scale in-domain datasets to learn the intricate features of histopathology images. In cancer diagnosis, whole-slide image (WSI) prediction is essential for patient prognosis, and multiple instance learning (MIL) has been implemented to handle the giga-pixel size of WSI. As MIL frameworks rely on patch-level feature aggregation, this work aims to compare the performance of various feature extractors developed under different pretraining strategies for cancer subtyping on WSI under a MIL framework. Results demonstrate the ability of foundation models to surpass ImageNet-pretrained models for the prediction of six skin cancer subtypes