CLJun 11, 2018

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

arXiv:1806.04185v11138 citations
Originality Synthesis-oriented
AI Analysis

This supports language processing for medical literature, aiding evidence-based medicine, but is incremental as it builds on existing PICO annotation frameworks.

The authors tackled the problem of extracting structured information from medical literature by creating a corpus of 5,000 annotated abstracts from clinical trials, with detailed annotations for patients, interventions, and outcomes, including mappings to a medical vocabulary.

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the `PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes