Beyond Multiple Instance Learning: Full Resolution All-In-Memory End-To-End Pathology Slide Modeling
This addresses a computational bottleneck in computational pathology for improving diagnostics and biomarkers, though it is incremental as it builds on existing methods but enables end-to-end training.
The paper tackles the challenge of training AI models on gigapixel pathology slides by proposing a novel approach to jointly train tile encoders and slide-aggregators fully in memory and end-to-end at high resolution, bridging the gap between input and slide-level supervision, with detailed quantitative validation showing promise for large-scale pre-training and fine-tuning.
Artificial Intelligence (AI) has great potential to improve health outcomes by training systems on vast digitized clinical datasets. Computational Pathology, with its massive amounts of microscopy image data and impact on diagnostics and biomarkers, is at the forefront of this development. Gigapixel pathology slides pose a unique challenge due to their enormous size and are usually divided into tens of thousands of smaller tiles for analysis. This results in a discontinuity in the machine learning process by separating the training of tile-level encoders from slide-level aggregators and the need to adopt weakly supervised learning strategies. Training models from entire pathology slides end-to-end has been largely unexplored due to its computational challenges. To overcome this problem, we propose a novel approach to jointly train both a tile encoder and a slide-aggregator fully in memory and end-to-end at high-resolution, bridging the gap between input and slide-level supervision. While more computationally expensive, detailed quantitative validation shows promise for large-scale pre-training and fine-tuning of pathology foundation models.