QMAIIVJun 3, 2024

Immunocto: a massive immune cell database auto-generated for histopathology

arXiv:2406.02618v2
Originality Incremental advance
AI Analysis

This provides a large-scale, publicly available resource for training models to study the tumor immune micro-environment in routine histopathology, addressing a bottleneck in computational pathology for cancer immunotherapy research.

The authors tackled the problem of characterizing the tumor immune micro-environment by introducing Immunocto, a massive database of 6,848,454 human cells, including 2,282,818 immune cells across four subtypes, automatically generated from dually stained tissue sections. They showed that deep learning models trained on this database achieve state-of-the-art performance for lymphocyte detection.

With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment (TIME) is crucial to inform on prognosis and understand potential response to therapeutic agents. A key approach to characterising the TIME may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. In this work, we introduce a workflow to automatically generate robust single cell contours and labels from dually stained tissue sections with H&E and multiplexed immunofluorescence (IF) markers. The approach harnesses the Segment Anything Model and requires minimal human intervention compared to existing single cell databases. With this methodology, we create Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells and objects, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, CD20$^+$ B cell lymphocytes, and CD68$^+$/CD163$^+$ macrophages. For each cell, we provide a 64$\times$64 pixels$^2$ H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. The database, which is made publicly available, can be used to train models to study the TIME on routine H&E slides. We show that deep learning models trained on Immunocto result in state-of-the-art performance for lymphocyte detection. The approach demonstrates the benefits of using matched H&E and IF data to generate robust databases for computational pathology applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes