CLDec 5, 2018

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

arXiv:1812.01885v120 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of semantic ambiguity in medical short texts, which is incremental as it builds on existing deep learning methods with a novel embedding technique.

The authors tackled the problem of short text classification in electronic medical records by proposing a method that adds word-cluster embedding to deep neural networks, resulting in improved performance over state-of-the-art baselines on both medical and general domain datasets.

Automatic text classification (TC) research can be used for real-world problems such as the classification of in-patient discharge summaries and medical text reports, which is beneficial to make medical documents more understandable to doctors. However, in electronic medical records (EMR), the texts containing sentences are shorter than that in general domain, which leads to the lack of semantic features and the ambiguity of semantic. To tackle this challenge, we propose to add word-cluster embedding to deep neural network for improving short text classification. Concretely, we first use hierarchical agglomerative clustering to cluster the word vectors in the semantic space. Then we calculate the cluster center vector which represents the implicit topic information of words in the cluster. Finally, we expand word vector with cluster center vector, and implement classifiers using CNN and LSTM respectively. To evaluate the performance of our proposed method, we conduct experiments on public data sets TREC and the medical short sentences data sets which is constructed and released by us. The experimental results demonstrate that our proposed method outperforms state-of-the-art baselines in short sentence classification on both medical domain and general domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes