CLLGJul 2, 2020

NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical Document De-Identification

arXiv:2007.01030v118 citations
AI Analysis

This work addresses the need for secure processing of medical documents like patient notes and clinical trials, but it is incremental as it applies existing sequence-labeling methods to a specific language and domain.

The paper tackled the problem of de-identifying privacy-sensitive information in Spanish medical documents by developing the NLNDE system for the MEDDOCAN competition, achieving promising results in this non-standard language and domain setting.

Natural language processing has huge potential in the medical domain which recently led to a lot of research in this field. However, a prerequisite of secure processing of medical documents, e.g., patient notes and clinical trials, is the proper de-identification of privacy-sensitive information. In this paper, we describe our NLNDE system, with which we participated in the MEDDOCAN competition, the medical document anonymization task of IberLEF 2019. We address the task of detecting and classifying protected health information from Spanish data as a sequence-labeling problem and investigate different embedding methods for our neural network. Despite dealing in a non-standard language and domain setting, the NLNDE system achieves promising results in the competition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes