BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients
This dataset addresses the need for comprehensive medical imaging data for COVID-19 research, enabling improved diagnosis and analysis, though it is incremental as it builds on existing data collection efforts.
The paper introduces BIMCV COVID-19+, a large annotated dataset of chest X-ray and CT images from COVID-19 patients, including radiological findings, reports, and diagnostic tests, with 1,380 CX, 885 DX, and 163 CT studies from 1,311 patients, making it the largest open-format COVID-19 image dataset available.
This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings have been mapped onto standard Unified Medical Language System (UMLS) terminology and cover a wide spectrum of thoracic entities, unlike the considerably more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels and stored in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. This is, to the best of our knowledge, the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from http://bimcv.cipf.es/bimcv-projects/bimcv-covid19.