CLMar 23, 2015

Unsupervised POS Induction with Word Embeddings

arXiv:1503.06760v176 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving unsupervised linguistic analysis for natural language processing, though it is incremental as it builds on prior models.

The paper tackled the problem of unsupervised part-of-speech (POS) induction by integrating word embeddings into existing models, resulting in consistent improvements across eight languages.

Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored. In this paper, we show that embeddings can likewise add value to the problem of unsupervised POS induction. In two representative models of POS induction, we replace multinomial distributions over the vocabulary with multivariate Gaussian distributions over word embeddings and observe consistent improvements in eight languages. We also analyze the effect of various choices while inducing word embeddings on "downstream" POS induction results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes