Extracting clinical concepts from user queries
This work addresses the challenge of extracting clinical entities from ungrammatical user queries for healthcare applications, but it is incremental as it builds on existing NER methods.
The paper tackled the problem of clinical concept extraction from user queries by adapting a BiLSTM-CRF-based NER model using a dataset of annotated clinical notes and synthesized user queries, resulting in improved performance on both user queries and clinical notes.
Clinical concept extraction often begins with clinical Named Entity Recognition (NER). Often trained on annotated clinical notes, clinical NER models tend to struggle with tagging clinical entities in user queries because of the structural differences between clinical notes and user queries. User queries, unlike clinical notes, are often ungrammatical and incoherent. In many cases, user queries are compounded of multiple clinical entities, without comma or conjunction words separating them. By using as dataset a mixture of annotated clinical notes and synthesized user queries, we adapt a clinical NER model based on the BiLSTM-CRF architecture for tagging clinical entities in user queries. Our contribution are the following: 1) We found that when trained on a mixture of synthesized user queries and clinical notes, the NER model performs better on both user queries and clinical notes. 2) We provide an end-to-end and easy-to-implement framework for clinical concept extraction from user queries.