CVNov 14, 2022

Self-training of Machine Learning Models for Liver Histopathology: Generalization under Clinical Shifts

arXiv:2211.07692v13 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This work addresses annotation constraints for pathologists in histopathology, but it is incremental as it applies an existing self-training method to a specific clinical domain.

The paper tackled the problem of limited annotations in liver histopathology by applying self-training to Non-alcoholic Steatohepatitis (NASH) datasets, resulting in a student model that outperformed the teacher by 3% in macro F1 score and approached the performance of a fully supervised model with double the annotations.

Histopathology images are gigapixel-sized and include features and information at different resolutions. Collecting annotations in histopathology requires highly specialized pathologists, making it expensive and time-consuming. Self-training can alleviate annotation constraints by learning from both labeled and unlabeled data, reducing the amount of annotations required from pathologists. We study the design of teacher-student self-training systems for Non-alcoholic Steatohepatitis (NASH) using clinical histopathology datasets with limited annotations. We evaluate the models on in-distribution and out-of-distribution test data under clinical data shifts. We demonstrate that through self-training, the best student model statistically outperforms the teacher with a $3\%$ absolute difference on the macro F1 score. The best student model also approaches the performance of a fully supervised model trained with twice as many annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes