CVFeb 7, 2025

SurGen: 1020 H&E-stained Whole Slide Images With Survival and Genetic Markers

arXiv:2502.04946v29 citationsh-index: 16GigaScience
Originality Incremental advance
AI Analysis

This dataset addresses the need for comprehensive datasets that combine histopathological images with genetic and survival data for advancing computational pathology and personalised medicine in colorectal cancer research.

The authors introduced SurGen, a dataset of 1,020 H&E-stained whole-slide images from 843 colorectal cancer cases with genetic and survival data, and demonstrated its utility with a proof-of-concept model that achieved an area under the receiver operating characteristic curve of 0.8273. This dataset can facilitate research in biomarker discovery and prognostic modeling for colorectal cancer.

Cancer remains one of the leading causes of morbidity and mortality worldwide. Comprehensive datasets that combine histopathological images with genetic and survival data across various tumour sites are essential for advancing computational pathology and personalised medicine. We present SurGen, a dataset comprising 1,020 H&E-stained whole-slide images (WSIs) from 843 colorectal cancer cases. The dataset includes detailed annotations for key genetic mutations (KRAS, NRAS, BRAF) and mismatch repair status, as well as survival data for 426 cases. We illustrate SurGen's utility with a proof-of-concept model that predicts mismatch repair status directly from WSIs, achieving a test area under the receiver operating characteristic curve of 0.8273. These preliminary results underscore the dataset's potential to facilitate research in biomarker discovery, prognostic modelling, and advanced machine learning applications in colorectal cancer and beyond. SurGen offers a valuable resource for the scientific community, enabling studies that require high-quality WSIs linked with comprehensive clinical and genetic information on colorectal cancer. Our initial findings affirm the dataset's capacity to advance diagnostic precision and foster the development of personalised treatment strategies in colorectal oncology. Data available online: https://doi.org/10.6019/S-BIAD1285.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes