CLLGMay 19, 2020

Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain

arXiv:2005.09397v1999 citations
AI Analysis

This work addresses the need for integrated privacy protection and information extraction in clinical NLP, offering a practical solution for healthcare data analysis.

The paper tackled the problem of jointly performing de-identification and concept extraction in clinical texts, which were previously studied in isolation, and achieved state-of-the-art results with 96.1% F1 for de-identification and 88.9% F1 for concept extraction in English, and 91.4% F1 for concept extraction in Spanish.

Exploiting natural language processing in the clinical domain requires de-identification, i.e., anonymization of personal information in texts. However, current research considers de-identification and downstream tasks, such as concept extraction, only in isolation and does not study the effects of de-identification on other tasks. In this paper, we close this gap by reporting concept extraction performance on automatically anonymized data and investigating joint models for de-identification and concept extraction. In particular, we propose a stacked model with restricted access to privacy-sensitive information and a multitask model. We set the new state of the art on benchmark datasets in English (96.1% F1 for de-identification and 88.9% F1 for concept extraction) and Spanish (91.4% F1 for concept extraction).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes