A Joint Neural Baseline for Concept, Assertion, and Relation Extraction from Clinical Text
This work provides a strong joint baseline for clinical information extraction, which is important for researchers and practitioners working with clinical text data.
This paper addresses the underexplored area of jointly modeling concept recognition, assertion classification, and relation extraction in clinical text. The proposed end-to-end system significantly outperforms pipeline baselines, achieving F1 score improvements of +0.3 for concept, +1.4 for assertion, and +3.1 for relation extraction.
Clinical information extraction (e.g., 2010 i2b2/VA challenge) usually presents tasks of concept recognition, assertion classification, and relation extraction. Jointly modeling the multi-stage tasks in the clinical domain is an underexplored topic. The existing independent task setting (reference inputs given in each stage) makes the joint models not directly comparable to the existing pipeline work. To address these issues, we define a joint task setting and propose a novel end-to-end system to jointly optimize three-stage tasks. We empirically investigate the joint evaluation of our proposal and the pipeline baseline with various embedding techniques: word, contextual, and in-domain contextual embeddings. The proposed joint system substantially outperforms the pipeline baseline by +0.3, +1.4, +3.1 for the concept, assertion, and relation F1. This work bridges joint approaches and clinical information extraction. The proposed approach could serve as a strong joint baseline for future research. The code is publicly available.