CV LGMar 3, 2023

Domain adaptation using optimal transport for invariant learning using histopathology datasets

Kianoush Falahkheirkhah, Alex Lu, David Alvarez-Melis, Grace Huynh

HarvardMicrosoft

arXiv:2303.02241v16.88 citationsh-index: 20Has Code

Originality Incremental advance

AI Analysis

This work addresses the critical issue of model generalization in histopathology for cancer diagnosis, though it is incremental as it builds on existing adversarial methods by incorporating distributional differences.

The paper tackled the problem of batch effects in histopathology models by proposing a domain adaptation method using optimal transport to improve generalization to unseen institutions without requiring labels or retraining, achieving reliable classification of rare cancer phenotypes on the Camelyon17 dataset where previous methods failed.

Histopathology is critical for the diagnosis of many diseases, including cancer. These protocols typically require pathologists to manually evaluate slides under a microscope, which is time-consuming and subjective, leading to interest in machine learning to automate analysis. However, computational techniques are limited by batch effects, where technical factors like differences in preparation protocol or scanners can alter the appearance of slides, causing models trained on one institution to fail when generalizing to others. Here, we propose a domain adaptation method that improves the generalization of histopathological models to data from unseen institutions, without the need for labels or retraining in these new settings. Our approach introduces an optimal transport (OT) loss, that extends adversarial methods that penalize models if images from different institutions can be distinguished in their representation space. Unlike previous methods, which operate on single samples, our loss accounts for distributional differences between batches of images. We show that on the Camelyon17 dataset, while both methods can adapt to global differences in color distribution, only our OT loss can reliably classify a cancer phenotype unseen during training. Together, our results suggest that OT improves generalization on rare but critical phenotypes that may only make up a small fraction of the total tiles and variation in a slide.

View on arXiv PDF Code

Similar