Secondary Use of Clinical Problem List Entries for Neural Network-Based Disease Code Assignment
This work addresses the problem of efficient disease code assignment for healthcare systems, but it is incremental as it applies existing neural network methods to clinical data.
The paper tackled automated ICD-10 coding from clinical problem list entries, achieving a top macro-averaged F1-score of 0.88 with a RoBERTa-based model, compared to 0.83-0.84 for baselines, and identified inconsistent manual coding as a key limitation.
Clinical information systems have become large repositories for semi-structured and partly annotated electronic health record data, which have reached a critical mass that makes them interesting for supervised data-driven neural network approaches. We explored automated coding of 50 character long clinical problem list entries using the International Classification of Diseases (ICD-10) and evaluated three different types of network architectures on the top 100 ICD-10 three-digit codes. A fastText baseline reached a macro-averaged F1-score of 0.83, followed by a character-level LSTM with a macro-averaged F1-score of 0.84. The top performing approach used a downstreamed RoBERTa model with a custom language model, yielding a macro-averaged F1-score of 0.88. A neural network activation analysis together with an investigation of the false positives and false negatives unveiled inconsistent manual coding as a main limiting factor.