IVJan 6, 2023
An interpretable machine learning system for colorectal cancer diagnosis from pathology slidesPedro C. Neto, Diana Montezuma, Sara P. Oliveira et al.
Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.
CVJul 26, 2024
A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and AttentionJoão D. Nunes, Diana Montezuma, Domingos Oliveira et al.
Manually annotating nuclei from the gigapixel Hematoxylin and Eosin (H&E)-stained Whole Slide Images (WSIs) is a laborious and costly task, meaning automated algorithms for cell nuclei instance segmentation and classification could alleviate the workload of pathologists and clinical researchers and at the same time facilitate the automatic extraction of clinically interpretable features. But due to high intra- and inter-class variability of nuclei morphological and chromatic features, as well as H&E-stains susceptibility to artefacts, state-of-the-art algorithms cannot correctly detect and classify instances with the necessary performance. In this work, we hypothesise context and attention inductive biases in artificial neural networks (ANNs) could increase the generalization of algorithms for cell nuclei instance segmentation and classification. We conduct a thorough survey on context and attention methods for cell nuclei instance segmentation and classification from H&E-stained microscopy imaging, while providing a comprehensive discussion of the challenges being tackled with context and attention. Besides, we illustrate some limitations of current approaches and present ideas for future research. As a case study, we extend both a general instance segmentation and classification method (Mask-RCNN) and a tailored cell nuclei instance segmentation and classification model (HoVer-Net) with context- and attention-based mechanisms, and do a comparative analysis on a multi-centre colon nuclei identification and counting dataset. Although pathologists rely on context at multiple levels while paying attention to specific Regions of Interest (RoIs) when analysing and annotating WSIs, our findings suggest translating that domain knowledge into algorithm design is no trivial task, but to fully exploit these mechanisms, the scientific understanding of these methods should be addressed.
CVMay 5
DALPHIN: Benchmarking Digital Pathology AI Copilots Against Pathologists on an Open Multicentric DatasetCarlijn Lems, Sander Moonemans, Natálie Klubíčková et al.
Foundation models with visual question answering capabilities for digital pathology are emerging. Such unprecedented technology requires independent benchmarking to assess its potential in assisting pathologists in routine diagnostics. We created DALPHIN, the first multicentric open benchmark for pathology AI copilots, comprising 1236 images from 300 cases, spanning 130 rare to common diagnoses, 6 countries, and 14 subspecialties. The DALPHIN design and dataset are introduced alongside a human performance benchmark of 31 pathologists from 10 countries with varying expertise. We report results for two general-purpose (GPT-5, Gemini 2.5 Pro) and one pathology-specific copilot (PathChat+) for sequential and independent answer generation. We observed no statistically significant difference from expert-level performance in four of six tasks for PathChat, 2/6 tasks for Gemini, and 1/6 tasks for GPT. DALPHIN is publicly released with sequestered, indirectly accessible ground truth to foster robust and enduring benchmarking. Data, methods, and the evaluation platform are accessible through dalphin.grand-challenge.org.
IVJun 23, 2025
GANs vs. Diffusion Models for virtual staining with the HER2match datasetPascal Klöckner, José Teixeira, Diana Montezuma et al.
Virtual staining is a promising technique that uses deep generative models to recreate histological stains, providing a faster and more cost-effective alternative to traditional tissue chemical staining. Specifically for H&E-HER2 staining transfer, despite a rising trend in publications, the lack of sufficient public datasets has hindered progress in the topic. Additionally, it is currently unclear which model frameworks perform best for this particular task. In this paper, we introduce the HER2match dataset, the first publicly available dataset with the same breast cancer tissue sections stained with both H&E and HER2. Furthermore, we compare the performance of several Generative Adversarial Networks (GANs) and Diffusion Models (DMs), and implement a novel Brownian Bridge Diffusion Model for H&E-HER2 translation. Our findings indicate that, overall, GANs perform better than DMs, with only the BBDM achieving comparable results. Furthermore, we emphasize the importance of data alignment, as all models trained on HER2match produced vastly improved visuals compared to the widely used consecutive-slide BCI dataset. This research provides a new high-quality dataset ([available upon publication acceptance]), improving both model training and evaluation. In addition, our comparison of frameworks offers valuable guidance for researchers working on the topic.
CVMar 13, 2025
CountPath: Automating Fragment Counting in Digital PathologyAna Beatriz Vieira, Maria Valente, Diana Montezuma et al.
Quality control of medical images is a critical component of digital pathology, ensuring that diagnostic images meet required standards. A pre-analytical task within this process is the verification of the number of specimen fragments, a process that ensures that the number of fragments on a slide matches the number documented in the macroscopic report. This step is important to ensure that the slides contain the appropriate diagnostic material from the grossing process, thereby guaranteeing the accuracy of subsequent microscopic examination and diagnosis. Traditionally, this assessment is performed manually, requiring significant time and effort while being subject to significant variability due to its subjective nature. To address these challenges, this study explores an automated approach to fragment counting using the YOLOv9 and Vision Transformer models. Our results demonstrate that the automated system achieves a level of performance comparable to expert assessments, offering a reliable and efficient alternative to manual counting. Additionally, we present findings on interobserver variability, showing that the automated approach achieves an accuracy of 86%, which falls within the range of variation observed among experts (82-88%), further supporting its potential for integration into routine pathology workflows.
CVNov 24, 2025
Leveraging Adversarial Learning for Pathological Fidelity in Virtual StainingJosé Teixeira, Pascal Klöckner, Diana Montezuma et al.
In addition to evaluating tumor morphology using H&E staining, immunohistochemistry is used to assess the presence of specific proteins within the tissue. However, this is a costly and labor-intensive technique, for which virtual staining, as an image-to-image translation task, offers a promising alternative. Although recent, this is an emerging field of research with 64% of published studies just in 2024. Most studies use publicly available datasets of H&E-IHC pairs from consecutive tissue sections. Recognizing the training challenges, many authors develop complex virtual staining models based on conditional Generative Adversarial Networks, but ignore the impact of adversarial loss on the quality of virtual staining. Furthermore, overlooking the issues of model evaluation, they claim improved performance based on metrics such as SSIM and PSNR, which are not sufficiently robust to evaluate the quality of virtually stained images. In this paper, we developed CSSP2P GAN, which we demonstrate to achieve heightened pathological fidelity through a blind pathological expert evaluation. Furthermore, while iteratively developing our model, we study the impact of the adversarial loss and demonstrate its crucial role in the quality of virtually stained images. Finally, while comparing our model with reference works in the field, we underscore the limitations of the currently used evaluation metrics and demonstrate the superior performance of CSSP2P GAN.