IV CVAug 28, 2024

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Peter Neidlinger, Omar S. M. El Nahhas, Hannah Sophie Muti, Tim Lenz, Michael Hoffmeister, Hermann Brenner, Marko van Treeck, Rupert Langer, Bastian Dislich, Hans Michael Behrens, Christoph Röcken, Sebastian Foersch

arXiv:2408.15823v229.765 citationsh-index: 43

Originality Synthesis-oriented

AI Analysis

This work addresses the need for independent evaluation of pathology foundation models on external cohorts and clinically-relevant tasks, providing actionable insights for improving AI in computational pathology, though it is incremental as it focuses on benchmarking rather than introducing new methods.

The study benchmarked 19 histopathology foundation models on 13 patient cohorts with 6,818 patients and 9,528 slides across multiple cancers, showing that a vision-language model (CONCH) performed best and that ensembles of models with complementary features could outperform state-of-the-art methods in 55% of tasks.

Advancements in artificial intelligence have driven the development of numerous pathology foundation models capable of extracting clinically relevant information. However, there is currently limited literature independently evaluating these foundation models on truly external cohorts and clinically-relevant tasks to uncover adjustments for future improvements. In this study, we benchmarked 19 histopathology foundation models on 13 patient cohorts with 6,818 patients and 9,528 slides from lung, colorectal, gastric, and breast cancers. The models were evaluated on weakly-supervised tasks related to biomarkers, morphological properties, and prognostic outcomes. We show that a vision-language foundation model, CONCH, yielded the highest performance when compared to vision-only foundation models, with Virchow2 as close second. The experiments reveal that foundation models trained on distinct cohorts learn complementary features to predict the same label, and can be fused to outperform the current state of the art. An ensemble combining CONCH and Virchow2 predictions outperformed individual models in 55% of tasks, leveraging their complementary strengths in classification scenarios. Moreover, our findings suggest that data diversity outweighs data volume for foundation models. Our work highlights actionable adjustments to improve pathology foundation models.

View on arXiv PDF

Similar