CLIRApr 2, 2019

Short Text Classification Improved by Feature Space Extension

arXiv:1904.01313v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of sparsity in short text classification for applications like mobile internet, but it is incremental as it builds on existing CNN and LDA methods.

The paper tackles short text classification by proposing a topic-based convolutional neural network (TB-CNN) that uses LDA to generate topic words and extend the feature space, showing improvement on the IMDB movie review dataset.

With the explosive development of mobile Internet, short text has been applied extensively. The difference between classifying short text and long documents is that short text is of shortness and sparsity. Thus, it is challenging to deal with short text classification owing to its less semantic information. In this paper, we propose a novel topic-based convolutional neural network (TB-CNN) based on Latent Dirichlet Allocation (LDA) model and convolutional neural network. Comparing to traditional CNN methods, TB-CNN generates topic words with LDA model to reduce the sparseness and combines the embedding vectors of topic words and input words to extend feature space of short text. The validation results on IMDB movie review dataset show the improvement and effectiveness of TB-CNN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes