CVFeb 4, 2025

LadderMIL: Multiple Instance Learning with Coarse-to-Fine Self-Distillation

Shuyang Wu, Yifu Qiu, Ines P. Nearchou, Sandrine Prost, Jonathan A. Fallowfield, Hideki Ueno, Hitoshi Tsuda, David J. Harrison, Hakan Bilen, Timothy J. Kendall

arXiv:2502.02707v4h-index: 36

Originality Highly original

AI Analysis

This work addresses the challenge of integrating instance and bag-level information in MIL for computational pathology, offering incremental improvements in clinically relevant tasks like cancer classification and prognosis prediction.

The paper tackles the problem of Multiple Instance Learning (MIL) for whole slide image analysis in computational pathology by introducing LadderMIL, which improves performance through instance-level supervision and inter-instance contextual learning, achieving average improvements of 8.1% in AUC, 11% in F1-score, and 2.4% in C-index across five benchmarks compared to the best baseline.

Multiple Instance Learning (MIL) for whole slide image (WSI) analysis in computational pathology often neglects instance-level learning as supervision is typically provided only at the bag level, hindering the integrated consideration of instance and bag-level information during the analysis. In this work, we present LadderMIL, a framework designed to improve MIL through two perspectives: (1) employing instance-level supervision and (2) learning inter-instance contextual information at bag level. Firstly, we propose a novel Coarse-to-Fine Self-Distillation (CFSD) paradigm that probes and distils a network trained with bag-level information to adaptively obtain instance-level labels which could effectively provide the instance-level supervision for the same network in a self-improving way. Secondly, to capture inter-instance contextual information in WSI, we propose a Contextual Encoding Generator (CEG), which encodes the contextual appearance of instances within a bag. We also theoretically and empirically prove the instance-level learnability of CFSD. Our LadderMIL is evaluated on multiple clinically relevant benchmarking tasks including breast cancer receptor status classification, multi-class subtype classification, tumour classification, and prognosis prediction. Average improvements of 8.1%, 11% and 2.4% in AUC, F1-score, and C-index, respectively, are demonstrated across the five benchmarks, compared to the best baseline.

View on arXiv PDF

Similar