Modeling Diagnostic Label Correlation for Automatic ICD Coding
This work addresses the problem of automatic ICD coding for healthcare professionals, but it is incremental as it builds upon existing predictors by adding a correlation-capturing module.
The paper tackles the challenge of predicting diagnostic codes from clinical notes by addressing the dependencies between labels, which are often ignored in existing methods. The proposed two-stage framework improves performance on benchmark MIMIC datasets by learning label set distributions as a reranking module.
Given the clinical notes written in electronic health records (EHRs), it is challenging to predict the diagnostic codes which is formulated as a multi-label classification task. The large set of labels, the hierarchical dependency, and the imbalanced data make this prediction task extremely hard. Most existing work built a binary prediction for each label independently, ignoring the dependencies between labels. To address this problem, we propose a two-stage framework to improve automatic ICD coding by capturing the label correlation. Specifically, we train a label set distribution estimator to rescore the probability of each label set candidate generated by a base predictor. This paper is the first attempt at learning the label set distribution as a reranking module for medical code prediction. In the experiments, our proposed framework is able to improve upon best-performing predictors on the benchmark MIMIC datasets. The source code of this project is available at https://github.com/MiuLab/ICD-Correlation.