CLAIAPMLFeb 28, 2013

KSU KDD: Word Sense Induction by Clustering in Topic Space

arXiv:1302.7056v121 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for natural language processing, specifically in word sense disambiguation.

The paper tackles unsupervised word sense induction by clustering words in topic space using LDA, achieving the second highest V-measure score in the SemEval-2 task.

We describe our language-independent unsupervised word sense induction system. This system only uses topic features to cluster different word senses in their global context topic space. Using unlabeled data, this system trains a latent Dirichlet allocation (LDA) topic model then uses it to infer the topics distribution of the test instances. By clustering these topics distributions in their topic space we cluster them into different senses. Our hypothesis is that closeness in topic space reflects similarity between different word senses. This system participated in SemEval-2 word sense induction and disambiguation task and achieved the second highest V-measure score among all other systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes