CL AIAug 24, 2020

Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III

Brent Biseda, Gaurav Desai, Haifeng Lin, Anish Philip

arXiv:2008.10492v116 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of automated medical coding for healthcare professionals, but it is incremental as it builds on existing methods with minor improvements.

The paper tackles the ICD code prediction task using the MIMIC-III dataset, achieving state-of-the-art results with an F1 score of 0.75 for the top 50 ICD codes through Clinical BERT embeddings and text augmentation with label balancing.

This paper achieves state of the art results for the ICD code prediction task using the MIMIC-III dataset. This was achieved through the use of Clinical BERT (Alsentzer et al., 2019). embeddings and text augmentation and label balancing to improve F1 scores for both ICD Chapter as well as ICD disease codes. We attribute the improved performance mainly to the use of novel text augmentation to shuffle the order of sentences during training. In comparison to the Top-32 ICD code prediction (Keyang Xu, et. al.) with an F1 score of 0.76, we achieve a final F1 score of 0.75 but on a total of the top 50 ICD codes.

View on arXiv PDF

Similar