CLAug 21, 2018

The Influence of Down-Sampling Strategies on SVD Word Embedding Stability

arXiv:1808.06810v21093 citations
Originality Incremental advance
AI Analysis

This addresses stability concerns in word embeddings for NLP researchers, offering an incremental improvement.

The study investigated how down-sampling strategies affect the stability of SVD-based word embeddings, finding that these strategies significantly influence stability and accuracy, leading to a simple modification that improves stability while matching skip-gram accuracy.

The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVDPPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes