CLJun 2, 2016

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

arXiv:1606.00819v282 citations
AI Analysis

This addresses the problem of improving word embeddings for natural language processing applications, but it appears incremental as it builds on existing matrix factorization approaches.

The paper tackles the problem of generating distributed word representations by proposing LexVec, a method using low-rank weighted factorization of the Positive Point-wise Mutual Information matrix with stochastic gradient descent, which matches or outperforms state-of-the-art methods on word similarity and analogy tasks.

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes