The Effects of Data Size and Frequency Range on Distributional Semantic Models
This work addresses the problem of model robustness for researchers and practitioners in natural language processing, but it is incremental as it compares existing models without introducing new methods.
The paper investigates how data size and frequency range affect distributional semantic models, finding that neural network-based models underperform with small data and that the inverted factorized model is the most reliable across varying conditions.
This paper investigates the effects of data size and frequency range on distributional semantic models. We compare the performance of a number of representative models for several test settings over data of varying sizes, and over test items of various frequency. Our results show that neural network-based models underperform when the data is small, and that the most reliable model over data of varying sizes and frequency ranges is the inverted factorized model.