CLMar 23, 2015

Unsupervised POS Induction with Word Embeddings

Chu-Cheng Lin, Waleed Ammar, Chris Dyer, Lori Levin

arXiv:1503.06760v119.576 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving unsupervised linguistic analysis for natural language processing, though it is incremental as it builds on prior models.

The paper tackled the problem of unsupervised part-of-speech (POS) induction by integrating word embeddings into existing models, resulting in consistent improvements across eight languages.

Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored. In this paper, we show that embeddings can likewise add value to the problem of unsupervised POS induction. In two representative models of POS induction, we replace multinomial distributions over the vocabulary with multivariate Gaussian distributions over word embeddings and observe consistent improvements in eight languages. We also analyze the effect of various choices while inducing word embeddings on "downstream" POS induction results.

View on arXiv PDF

Similar