CLJan 20, 2018

Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis

arXiv:1801.06607v225 citations
AI Analysis

This work addresses efficiency issues in text classification for practitioners needing faster processing, though it appears incremental as it builds on PCA methods.

The authors tackled the problem of high computational complexity in text classification by proposing a novel dimension reduction technique called tree-structured multi-linear principal component analysis (TMPCA), which reduces input sequence dimensions and achieves comparable or better performance than state-of-the-art RNNs with lower complexity.

A novel text data dimension reduction technique, called the tree-structured multi-linear principal component anal- ysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA). Furthermore, it is demon- strated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes