Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification
This provides a standardized benchmark for researchers in medical informatics working on automated ICD coding, though it is incremental as it builds on existing datasets and methods.
The paper tackles the lack of widely accepted benchmarks for automated ICD coding by proposing a new benchmark suite using the MIMIC-IV EHR dataset, establishing standardized data preprocessing and comparing popular methods to foster reproducibility and accelerate progress in this field.
Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.