CLAug 30, 2024

Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study

arXiv:2408.17181v12 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accurate clinical event contextualization to enhance downstream applications like disease prediction, but it is incremental as it builds on existing methods like MedCAT.

The study tackled the problem of extracting contextual properties of clinical events from unstructured electronic health records by comparing natural language models for classification, finding that transformer-based models like BERT, with class imbalance mitigation, improved recall for minority classes by up to 28% over Bi-LSTM and 16% over baseline BERT.

Electronic Health Records are large repositories of valuable clinical data, with a significant portion stored in unstructured text format. This textual data includes clinical events (e.g., disorders, symptoms, findings, medications and procedures) in context that if extracted accurately at scale can unlock valuable downstream applications such as disease prediction. Using an existing Named Entity Recognition and Linking methodology, MedCAT, these identified concepts need to be further classified (contextualised) for their relevance to the patient, and their temporal and negated status for example, to be useful downstream. This study performs a comparative analysis of various natural language models for medical text classification. Extensive experimentation reveals the effectiveness of transformer-based language models, particularly BERT. When combined with class imbalance mitigation techniques, BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes. The method has been implemented as part of CogStack/MedCAT framework and made available to the community for further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes