Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
This work addresses the problem of improving downstream performance in computational pathology for researchers and clinicians, but it is incremental as it builds on existing scaling approaches with tailored modifications.
The authors tackled the challenge of optimizing foundation models for computational pathology by scaling data and model size with domain-specific algorithmic modifications, achieving state-of-the-art performance on 12 tile-level tasks using models like Virchow2 (632M parameters) and Virchow2G (1.9B parameters) trained on 3.1 million histopathology images.
Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and model size, surpassing previous studies in both dimensions. We introduce three new models: Virchow2, a 632 million parameter vision transformer, Virchow2G, a 1.9 billion parameter vision transformer, and Virchow2G Mini, a 22 million parameter distillation of Virchow2G, each trained with 3.1 million histopathology whole slide images, with diverse tissues, originating institutions, and stains. We achieve state of the art performance on 12 tile-level tasks, as compared to the top performing competing models. Our results suggest that data diversity and domain-specific methods can outperform models that only scale in the number of parameters, but, on average, performance benefits from the combination of domain-specific methods, data scale, and model scale.