LG CL MLDec 12, 2019

Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

Marta Skreta, Aryan Arbabi, Jixuan Wang, Michael Brudno

arXiv:1912.06174v13.43 citations

Originality Incremental advance

AI Analysis

This work improves generalizability for clinical note processing, but it is incremental as it builds on existing methods with data augmentation and context enhancements.

The paper tackled the problem of automated medical abbreviation disambiguation by addressing data scarcity and imbalance, resulting in a 14% accuracy boost on CASI and 4% on i2b2 datasets.

Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model's ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model's representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14% on the CASI dataset and 4% on i2b2.

View on arXiv PDF

Similar