CLAILGOct 2, 2020

Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit

arXiv:2010.01165v2213 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of processing electronic health records for clinical analysis, offering a practical tool for healthcare and research, though it is incremental as it builds on existing information extraction technologies.

The authors tackled the problem of extracting medical concepts from unstructured clinical text by introducing MedCAT, a toolkit that achieved improved performance in extracting UMLS concepts with F1 scores ranging from 0.448 to 0.738 compared to baselines, and demonstrated strong transferability across hospitals with F1 > 0.94.

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a feature-rich annotation interface for customising and training IE models; and c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ~8.8B words from ~17M clinical records and further fine-tuning with ~6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets, and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes