CLMay 30, 2016

Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change

arXiv:1605.09096v61007 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of modeling language and cultural evolution by providing a data-driven approach to test theories of semantic change, though it is incremental in applying existing embedding methods to historical data.

The researchers tackled the problem of quantifying how word meanings change over time by developing a robust methodology using word embeddings, and they revealed two statistical laws: semantic change rate scales inversely with word frequency, and polysemous words change faster regardless of frequency.

Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust methodology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec) against known historical changes. We then use this methodology to reveal statistical laws of semantic evolution. Using six historical corpora spanning four languages and two centuries, we propose two quantitative laws of semantic change: (i) the law of conformity---the rate of semantic change scales with an inverse power-law of word frequency; (ii) the law of innovation---independent of frequency, words that are more polysemous have higher rates of semantic change.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes