CLLGApr 11, 2020

Annotating Social Determinants of Health Using Active Learning, and Characterizing Determinants Using Neural Event Extraction

arXiv:2004.05438v287 citations
Originality Incremental advance
AI Analysis

This work addresses the need for data-driven models to extract SDOH from clinical text, which can inform clinical decision-making, but it is incremental as it builds on existing information extraction methods with a new corpus and framework.

The authors tackled the problem of automatically extracting social determinants of health (SDOH) from clinical text by creating a new annotated corpus (SHAC) and a novel active learning framework, resulting in high extraction performance (e.g., 0.82-0.93 F1 for substance use status) across multiple institutions.

Social determinants of health (SDOH) affect health outcomes, and knowledge of SDOH can inform clinical decision-making. Automatically extracting SDOH information from clinical text requires data-driven information extraction models trained on annotated corpora that are heterogeneous and frequently include critical SDOH. This work presents a new corpus with SDOH annotations, a novel active learning framework, and the first extraction results on the new corpus. The Social History Annotation Corpus (SHAC) includes 4,480 social history sections with detailed annotation for 12 SDOH characterizing the status, extent, and temporal information of 18K distinct events. We introduce a novel active learning framework that selects samples for annotation using a surrogate text classification task as a proxy for a more complex event extraction task. The active learning framework successfully increases the frequency of health risk factors and improves automatic extraction of these events over undirected annotation. An event extraction model trained on SHAC achieves high extraction performance for substance use status (0.82-0.93 F1), employment status (0.81-0.86 F1), and living status type (0.81-0.93 F1) on data from three institutions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes