Named Entities in Medical Case Reports: Corpus and Experiments
This work addresses the need for structured data in medical text analysis, but it is incremental as it focuses on corpus creation and baseline models rather than novel methods.
The authors tackled the problem of extracting medical information from case reports by creating a new annotated corpus of medical entities and relations, and they provided baseline systems for tasks like Named Entity Recognition and Relation Extraction.
We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.