CLIRLGSep 8, 2021

Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP

arXiv:2109.03777v3642 citations
Originality Incremental advance
AI Analysis

This work addresses the efficiency and effectiveness of text classification methods for researchers and practitioners, showing that simpler models can match or exceed complex graph-based approaches, which is incremental but challenges current trends.

The paper tackles the problem of text classification by comparing Bag-of-Words, graph-based, and sequence-based methods, finding that a wide MLP with BoW outperforms some graph-based models and is competitive with others, while fine-tuned BERT and DistilBERT achieve state-of-the-art results, questioning the necessity of synthetic graphs in text classifiers.

Graph neural networks have triggered a resurgence of graph-based text classification methods, defining today's state of the art. We show that a wide multi-layer perceptron (MLP) using a Bag-of-Words (BoW) outperforms the recent graph-based models TextGCN and HeteGCN in an inductive text classification setting and is comparable with HyperGAT. Moreover, we fine-tune a sequence-based BERT and a lightweight DistilBERT model, which both outperform all state-of-the-art models. These results question the importance of synthetic graphs used in modern text classifiers. In terms of efficiency, DistilBERT is still twice as large as our BoW-based wide MLP, while graph-based models like TextGCN require setting up an $\mathcal{O}(N^2)$ graph, where $N$ is the vocabulary plus corpus size. Finally, since Transformers need to compute $\mathcal{O}(L^2)$ attention weights with sequence length $L$, the MLP models show higher training and inference speeds on datasets with long sequences.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes