CLApr 9, 2018

Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

arXiv:1804.03257v21090 citations
Originality Incremental advance
AI Analysis

This addresses polysemy in NLP for tasks like disambiguation and interpretability, but it is incremental as it builds on existing graph-based methods with efficiency gains.

The paper tackles word sense induction by proposing a graph-based method that uses distributional inclusion vector embeddings to form an interpretable basis, avoiding expensive nearest neighbor search. Experiments on three datasets show it achieves similar or better sense clusters and embeddings than state-of-the-art methods with significantly improved efficiency.

Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-the-art methods while being significantly more efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes