Factors Influencing the Surprising Instability of Word Embeddings
This addresses a critical limitation in natural language processing for researchers and practitioners, as it highlights reliability issues in widely used embedding methods.
The paper investigates the instability of word embeddings, showing that even high-frequency words (100-200 occurrences) can be unstable, and analyzes how various factors affect this stability and its impact on downstream tasks.
Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.