CL CR LGAug 18, 2021

De-identification of Unstructured Clinical Texts from Sequence to Sequence Perspective

Md Monowar Anjum, Noman Mohammed, Xiaoqian Jiang

arXiv:2108.07971v21.04 citations

Originality Incremental advance

AI Analysis

This work addresses privacy protection in healthcare by de-identifying clinical texts, but it is incremental as it adapts an existing method to a new problem formulation.

The paper tackled de-identification of unstructured clinical text by formulating it as a sequence-to-sequence learning problem instead of token classification, achieving a 98.91% recall rate on the i2b2 dataset, which is comparable to current state-of-the-art models.

In this work, we propose a novel problem formulation for de-identification of unstructured clinical text. We formulate the de-identification problem as a sequence to sequence learning problem instead of a token classification problem. Our approach is inspired by the recent state-of -the-art performance of sequence to sequence learning models for named entity recognition. Early experimentation of our proposed approach achieved 98.91% recall rate on i2b2 dataset. This performance is comparable to current state-of-the-art models for unstructured clinical text de-identification.

View on arXiv PDF

Similar