CVAITOAug 29, 2023

A General-Purpose Self-Supervised Model for Computational Pathology

arXiv:2308.15474v167 citationsh-index: 35
Originality Highly original
AI Analysis

This work addresses the challenge of large-scale data annotation in computational pathology for clinical applications, representing a significant advancement rather than an incremental improvement.

The paper tackles the problem of tissue phenotyping in computational pathology by introducing UNI, a general-purpose self-supervised model pretrained on over 100 million tissue patches from 100,000 whole-slide images across 20 tissue types, which outperforms previous state-of-the-art models on 33 clinical tasks and demonstrates new capabilities like resolution-agnostic classification and few-shot learning.

Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes