CLApr 30, 2020

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

arXiv:2004.14876v2665 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of inconsistent embedding reliability for researchers using them in multilingual NLP applications, but it is incremental as it analyzes existing methods on new data.

The paper investigated the variability in word embedding stability across languages, finding correlations with linguistic properties like affixing and gender systems, which has implications for embedding use in language trend studies.

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes