LGCLMLDec 12, 2019

Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

arXiv:1912.06174v13 citations
Originality Incremental advance
AI Analysis

This work improves generalizability for clinical note processing, but it is incremental as it builds on existing methods with data augmentation and context enhancements.

The paper tackled the problem of automated medical abbreviation disambiguation by addressing data scarcity and imbalance, resulting in a 14% accuracy boost on CASI and 4% on i2b2 datasets.

Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model's ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model's representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14% on the CASI dataset and 4% on i2b2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes