CLFeb 10, 2018

TextZoo, a New Benchmark for Reconsidering Text Classification

Benyou Wang, Li Wang, Qikang Wei, Lichun Liu

arXiv:1802.03656v2

Originality Synthesis-oriented

AI Analysis

This work addresses the need for a standardized evaluation framework for researchers and practitioners in NLP, though it is incremental as it builds upon existing models and datasets.

The authors tackled the lack of a unified benchmark for comparing neural network models in text classification by re-implementing over 20 models across more than 10 datasets, resulting in an analysis that reveals the advantages of different components in various settings.

Text representation is a fundamental concern in Natural Language Processing, especially in text classification. Recently, many neural network approaches with delicate representation model (e.g. FASTTEXT, CNN, RNN and many hybrid models with attention mechanisms) claimed that they achieved state-of-art in specific text classification datasets. However, it lacks an unified benchmark to compare these models and reveals the advantage of each sub-components for various settings. We re-implement more than 20 popular text representation models for classification in more than 10 datasets. In this paper, we reconsider the text classification task in the perspective of neural network and get serval effects with analysis of the above results.

View on arXiv PDF

Similar