AutoKary2022: A Large-Scale Densely Annotated Dataset for Chromosome Instance Segmentation
This dataset addresses the lack of annotated data for diagnosing chromosomal disorders, though it is incremental as it builds on existing efforts in medical imaging.
The authors tackled the problem of automated chromosome instance segmentation for karyotype analysis by constructing AutoKary2022, a large-scale dataset with over 27,000 densely annotated chromosome instances, and systematically evaluated representative methods to gain insights into the fundamental challenges.
Automated chromosome instance segmentation from metaphase cell microscopic images is critical for the diagnosis of chromosomal disorders (i.e., karyotype analysis). However, it is still a challenging task due to lacking of densely annotated datasets and the complicated morphologies of chromosomes, e.g., dense distribution, arbitrary orientations, and wide range of lengths. To facilitate the development of this area, we take a big step forward and manually construct a large-scale densely annotated dataset named AutoKary2022, which contains over 27,000 chromosome instances in 612 microscopic images from 50 patients. Specifically, each instance is annotated with a polygonal mask and a class label to assist in precise chromosome detection and segmentation. On top of it, we systematically investigate representative methods on this dataset and obtain a number of interesting findings, which helps us have a deeper understanding of the fundamental problems in chromosome instance segmentation. We hope this dataset could advance research towards medical understanding. The dataset can be available at: https://github.com/wangjuncongyu/chromosome-instance-segmentation-dataset.