IVCVQMDec 16, 2019

Lung and Colon Cancer Histopathological Image Dataset (LC25000)

arXiv:1912.12142v1337 citations
Originality Synthesis-oriented
AI Analysis

This provides a freely available, validated dataset for AI researchers working on cancer diagnosis, though it is incremental as it adds to existing medical image resources.

The authors tackled the scarcity of ML-ready medical image datasets, especially for cancer pathology, by creating LC25000, a dataset with 25,000 color images across 5 classes, each containing 5,000 images of lung and colon cancer histologic entities.

The field of Machine Learning, a subset of Artificial Intelligence, has led to remarkable advancements in many areas, including medicine. Machine Learning algorithms require large datasets to train computer models successfully. Although there are medical image datasets available, more image datasets are needed from a variety of medical entities, especially cancer pathology. Even more scarce are ML-ready image datasets. To address this need, we created an image dataset (LC25000) with 25,000 color images in 5 classes. Each class contains 5,000 images of the following histologic entities: colon adenocarcinoma, benign colonic tissue, lung adenocarcinoma, lung squamous cell carcinoma, and benign lung tissue. All images are de-identified, HIPAA compliant, validated, and freely available for download to AI researchers.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes