Do Convolutional Networks need to be Deep for Text Classification ?
This work addresses the problem of optimizing network architecture for text classification, showing that depth is not always necessary, which is incremental but provides practical insights for researchers and practitioners in NLP.
The paper investigates the role of depth in convolutional networks for text classification, finding that deep models perform better with character inputs, but a shallow-and-wide network outperforms deep models like DenseNet with word inputs, achieving new state-of-the-art results of 95.9% on Yelp Binary and 64.9% on Yelp Full.
We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%).