CLMay 9, 2018

Incorporating Subword Information into Matrix Factorization Word Embeddings

arXiv:1805.03710v11094 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of handling rare and out-of-vocabulary words in NLP, but it is incremental as it extends known subword benefits from predictive to counting models.

The paper investigated whether incorporating subword information into counting-based word embedding models improves performance, finding that it enhances representations for rare and out-of-vocabulary words.

The positive effect of adding subword information to word embeddings has been demonstrated for predictive models. In this paper we investigate whether similar benefits can also be derived from incorporating subwords into counting models. We evaluate the impact of different types of subwords (n-grams and unsupervised morphemes), with results confirming the importance of subword information in learning representations of rare and out-of-vocabulary words.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes