Semi-Supervised Histology Classification using Deep Multiple Instance Learning and Contrastive Predictive Coding
This work addresses the challenge of limited labeled data in medical imaging for pathologists, though it is incremental as it builds on existing MIL and CPC methods.
The paper tackled the problem of overfitting and poor feature learning in weakly supervised histology classification by proposing a two-stage semi-supervised approach combining contrastive predictive coding (CPC) and attention-based multiple instance learning (MIL), achieving state-of-the-art results with 95% mean validation accuracy and an AUC of 0.968 for breast cancer histology classification.
Convolutional neural networks can be trained to perform histology slide classification using weak annotations with multiple instance learning (MIL). However, given the paucity of labeled histology data, direct application of MIL can easily suffer from overfitting and the network is unable to learn rich feature representations due to the weak supervisory signal. We propose to overcome such limitations with a two-stage semi-supervised approach that combines the power of data-efficient self-supervised feature learning via contrastive predictive coding (CPC) and the interpretability and flexibility of regularized attention-based MIL. We apply our two-stage CPC + MIL semi-supervised pipeline to the binary classification of breast cancer histology images. Across five random splits, we report state-of-the-art performance with a mean validation accuracy of 95% and an area under the ROC curve of 0.968. We further evaluate the quality of features learned via CPC relative to simple transfer learning and show that strong classification performance using CPC features can be efficiently leveraged under the MIL framework even with the feature encoder frozen.