CLJun 2, 2016

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

Alexandre Salle, Marco Idiart, Aline Villavicencio

arXiv:1606.00819v215.682 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of improving word embeddings for natural language processing applications, but it appears incremental as it builds on existing matrix factorization approaches.

The paper tackles the problem of generating distributed word representations by proposing LexVec, a method using low-rank weighted factorization of the Positive Point-wise Mutual Information matrix with stochastic gradient descent, which matches or outperforms state-of-the-art methods on word similarity and analogy tasks.

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.

View on arXiv PDF Code

Similar