CLJun 24, 2021

Modeling Diagnostic Label Correlation for Automatic ICD Coding

arXiv:2106.12800v1727 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of automatic ICD coding for healthcare professionals, but it is incremental as it builds upon existing predictors by adding a correlation-capturing module.

The paper tackles the challenge of predicting diagnostic codes from clinical notes by addressing the dependencies between labels, which are often ignored in existing methods. The proposed two-stage framework improves performance on benchmark MIMIC datasets by learning label set distributions as a reranking module.

Given the clinical notes written in electronic health records (EHRs), it is challenging to predict the diagnostic codes which is formulated as a multi-label classification task. The large set of labels, the hierarchical dependency, and the imbalanced data make this prediction task extremely hard. Most existing work built a binary prediction for each label independently, ignoring the dependencies between labels. To address this problem, we propose a two-stage framework to improve automatic ICD coding by capturing the label correlation. Specifically, we train a label set distribution estimator to rescore the probability of each label set candidate generated by a base predictor. This paper is the first attempt at learning the label set distribution as a reranking module for medical code prediction. In the experiments, our proposed framework is able to improve upon best-performing predictors on the benchmark MIMIC datasets. The source code of this project is available at https://github.com/MiuLab/ICD-Correlation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes