CLMay 9, 2018

Incorporating Subword Information into Matrix Factorization Word Embeddings

arXiv:1805.03710v132.01094 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of handling rare and out-of-vocabulary words in NLP, but it is incremental as it extends known subword benefits from predictive to counting models.

The paper investigated whether incorporating subword information into counting-based word embedding models improves performance, finding that it enhances representations for rare and out-of-vocabulary words.

The positive effect of adding subword information to word embeddings has been demonstrated for predictive models. In this paper we investigate whether similar benefits can also be derived from incorporating subwords into counting models. We evaluate the impact of different types of subwords (n-grams and unsupervised morphemes), with results confirming the importance of subword information in learning representations of rare and out-of-vocabulary words.

View on arXiv PDF Code

Similar