Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections
This work addresses the need for high-quality, clinically relevant benchmarks in computational pathology for AI-driven diagnostic support in breast cancer metastasis detection, though it is incremental as it builds on existing datasets.
The study reprocessed 1,399 whole slide images from Camelyon datasets to correct labels and provide expert annotations, upgrading the task from binary to four-class classification for pathological lymph node metastasis in breast cancer, and benchmarked AI methods on this cleaned dataset.
Advances in optical microscopy scanning have significantly contributed to computational pathology (CPath) by converting traditional histopathological slides into whole slide images (WSIs). This development enables comprehensive digital reviews by pathologists and accelerates AI-driven diagnostic support for WSI analysis. Recent advances in foundational pathology models have increased the need for benchmarking tasks. The Camelyon series is one of the most widely used open-source datasets in computational pathology. However, the quality, accessibility, and clinical relevance of the labels have not been comprehensively evaluated. In this study, we reprocessed 1,399 WSIs and labels from the Camelyon-16 and Camelyon-17 datasets, removing low-quality slides, correcting erroneous labels, and providing expert pixel annotations for tumor regions in the previously unreleased test set. Based on the sizes of re-annotated tumor regions, we upgraded the binary cancer screening task to a four-class task: negative, micro-metastasis, macro-metastasis, and Isolated Tumor Cells (ITC). We reevaluated pre-trained pathology feature extractors and multiple instance learning (MIL) methods using the cleaned dataset, providing a benchmark that advances AI development in histopathology.