Hematoxylin and eosin stained oral squamous cell carcinoma histological images dataset
This dataset addresses a bottleneck for researchers and pathologists in oral cancer diagnosis by providing a new labeled resource, though it is incremental as it focuses on a specific domain.
The paper tackles the lack of labeled data for histological image segmentation by presenting the OCDC dataset, which includes 1,020 manually annotated H&E-stained images of oral squamous cell carcinoma for segmentation tasks.
Computer-aided diagnosis (CAD) can be used as an important tool to aid and enhance pathologists' diagnostic decision-making. Deep learning techniques, such as convolutional neural networks (CNN) and fully convolutional networks (FCN), have been successfully applied in medical and biological research. Unfortunately, histological image segmentation is often constrained by the availability of labeled training data once labeling histological images for segmentation purposes is a highly-skilled, complex, and time-consuming task. This paper presents the hematoxylin and eosin (H&E) stained oral cavity-derived cancer (OCDC) dataset, a labeled dataset containing H&E-stained histological images of oral squamous cell carcinoma (OSCC) cases. The tumor regions in our dataset are labeled manually by a specialist and validated by a pathologist. The OCDC dataset presents 1,020 histological images of size 640x640 pixels containing tumor regions fully annotated for segmentation purposes. All the histological images are digitized at 20x magnification.