CLAILGNov 29, 2017

Embedding Words as Distributions with a Bayesian Skip-gram Model

arXiv:1711.11027v21104 citations
Originality Incremental advance
AI Analysis

This work addresses the limitation of static word embeddings in natural language processing by providing a more nuanced representation, though it is incremental as it builds on existing Gaussian embedding methods.

The paper tackles the problem of representing words with fixed embeddings by introducing a Bayesian model that embeds words as probability densities, allowing context-specific meanings. It demonstrates competitive results on standard benchmarks and shows potential applications like lexical substitution.

We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential 'meanings'. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes