IVCVTOJun 11, 2025

A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma

arXiv:2506.09661v1h-index: 9
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accessible early OSCC diagnosis in resource-constrained regions, though it is incremental as it primarily provides a dataset rather than a novel AI method.

The authors tackled the problem of limited accessibility to oral squamous cell carcinoma (OSCC) diagnosis in low-resource settings by introducing the first large, multicenter oral cytology dataset from India, comprising annotated slides with two staining protocols, which aims to advance AI-driven diagnostic methods for early detection.

Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases. Early detection dramatically improves outcomes, with stage I cancers achieving up to 90 percent survival. However, traditional diagnosis based on histopathology has limited accessibility in low-resource settings because it is invasive, resource-intensive, and reliant on expert pathologists. On the other hand, oral cytology of brush biopsy offers a minimally invasive and lower cost alternative, provided that the remaining challenges, inter observer variability and unavailability of expert pathologists can be addressed using artificial intelligence. Development and validation of robust AI solutions requires access to large, labeled, and multi-source datasets to train high capacity models that generalize across domain shifts. We introduce the first large and multicenter oral cytology dataset, comprising annotated slides stained with Papanicolaou(PAP) and May-Grunwald-Giemsa(MGG) protocols, collected from ten tertiary medical centers in India. The dataset is labeled and annotated by expert pathologists for cellular anomaly classification and detection, is designed to advance AI driven diagnostic methods. By filling the gap in publicly available oral cytology datasets, this resource aims to enhance automated detection, reduce diagnostic errors, and improve early OSCC diagnosis in resource-constrained settings, ultimately contributing to reduced mortality and better patient outcomes worldwide.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes