On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions
This work addresses the problem of optimizing word embeddings for natural language processing, offering incremental improvements in model design and training efficiency.
The authors introduced a unified framework called Word-Context Classification (WCC) that generalizes SkipGram with negative sampling (SGN) models, and experimentally found that using the data distribution as the noise distribution improves embedding performance and training convergence speed.
SkipGram word embedding models with negative sampling, or SGN in short, is an elegant family of word embedding models. In this paper, we formulate a framework for word embedding, referred to as Word-Context Classification (WCC), that generalizes SGN to a wide family of models. The framework, which uses some ``noise examples'', is justified through theoretical analysis. The impact of noise distribution on the learning of the WCC embedding models is studied experimentally, suggesting that the best noise distribution is, in fact, the data distribution, in terms of both the embedding performance and the speed of convergence during training. Along our way, we discover several novel embedding models that outperform existing WCC models.