CL LGOct 15, 2020

Where's the Question? A Multi-channel Deep Convolutional Neural Network for Question Identification in Textual Data

George Michalopoulos, Helen Chen, Alexander Wong

arXiv:2010.07816v1993 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the issue of inaccurate clinical documentation for healthcare settings, though it is incremental as it builds on existing deep learning methods for question identification.

The study tackled the problem of automatically identifying 'real questions' in clinical dialogues to uncover ambiguities in data capturing, and the proposed Quest-CNN achieved the best F1 score on both a dialysis care dataset and a general domain dataset.

In most clinical practice settings, there is no rigorous reviewing of the clinical documentation, resulting in inaccurate information captured in the patient medical records. The gold standard in clinical data capturing is achieved via "expert-review", where clinicians can have a dialogue with a domain expert (reviewers) and ask them questions about data entry rules. Automatically identifying "real questions" in these dialogues could uncover ambiguities or common problems in data capturing in a given clinical setting. In this study, we proposed a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions that expect an answer (information or help) about an issue from sentences that are not questions, as well as from questions referring to an issue mentioned in a nearby sentence (e.g., can you clarify this?), which we will refer as "c-questions". We conducted a comprehensive performance comparison analysis of the proposed multi-channel deep convolutional neural network against other deep neural networks. Furthermore, we evaluated the performance of traditional rule-based and learning-based methods for detecting question sentences. The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.

View on arXiv PDF Code

Similar