CL LGJul 2, 2020

NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical Document De-Identification

Lukas Lange, Heike Adel, Jannik Strötgen

arXiv:2007.01030v10.718 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for secure processing of medical documents like patient notes and clinical trials, but it is incremental as it applies existing sequence-labeling methods to a specific language and domain.

The paper tackled the problem of de-identifying privacy-sensitive information in Spanish medical documents by developing the NLNDE system for the MEDDOCAN competition, achieving promising results in this non-standard language and domain setting.

Natural language processing has huge potential in the medical domain which recently led to a lot of research in this field. However, a prerequisite of secure processing of medical documents, e.g., patient notes and clinical trials, is the proper de-identification of privacy-sensitive information. In this paper, we describe our NLNDE system, with which we participated in the MEDDOCAN competition, the medical document anonymization task of IberLEF 2019. We address the task of detecting and classifying protected health information from Spanish data as a sequence-labeling problem and investigate different embedding methods for our neural network. Despite dealing in a non-standard language and domain setting, the NLNDE system achieves promising results in the competition.

View on arXiv PDF

Similar