HC CL LGJul 16, 2019

MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation

Thomas Searle, Zeljko Kraljevic, Rebecca Bendayan, Daniel Bean, Richard Dobson

arXiv:1907.07322v164.5999 citations

Originality Incremental advance

AI Analysis

This addresses the difficulty of obtaining specialist-labeled data for biomedical text analysis, enabling more effective secondary use of clinical data for research, though it is incremental as it builds on existing NER+L methods with interface improvements.

The paper tackles the problem of collecting labeled data for biomedical named entity recognition and linking (NER+L) models by introducing MedCATTrainer, an interactive web interface that uses active learning to improve model accuracy and allows customization for specific research use cases, with initial results showing efficient and accurate data collection.

We present MedCATTrainer an interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model for biomedical domain text. NER+L is often used as a first step in deriving value from clinical text. Collecting labelled data for training models is difficult due to the need for specialist domain knowledge. MedCATTrainer offers an interactive web-interface to inspect and improve recognised entities from an underlying NER+L model via active learning. Secondary use of data for clinical research often has task and context specific criteria. MedCATTrainer provides a further interface to define and collect supervised learning training data for researcher specific use cases. Initial results suggest our approach allows for efficient and accurate collection of research use case specific training data.

View on arXiv PDF

Similar