CLSep 27, 2016

The Effects of Data Size and Frequency Range on Distributional Semantic Models

arXiv:1609.08293v165 citations
AI Analysis

This work addresses the problem of model robustness for researchers and practitioners in natural language processing, but it is incremental as it compares existing models without introducing new methods.

The paper investigates how data size and frequency range affect distributional semantic models, finding that neural network-based models underperform with small data and that the inverted factorized model is the most reliable across varying conditions.

This paper investigates the effects of data size and frequency range on distributional semantic models. We compare the performance of a number of representative models for several test settings over data of varying sizes, and over test items of various frequency. Our results show that neural network-based models underperform when the data is small, and that the most reliable model over data of varying sizes and frequency ranges is the inverted factorized model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes