CLLGMLJun 9, 2015

WordRank: Learning Word Embeddings via Robust Ranking

arXiv:1506.02761v438 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and robust word embedding learning for natural language processing, particularly in sparse and noisy data scenarios, though it is incremental as it builds on existing ranking-based insights.

The authors tackled the problem of learning word embeddings by framing it as a ranking problem, proposing WordRank, which uses robust ranking losses to achieve competitive performance on benchmarks, especially outperforming state-of-the-art methods with limited training data, such as matching performance with 17 million tokens versus 7.2 billion tokens.

Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on this insight, we propose a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses. The performance of WordRank is measured in word similarity and word analogy benchmarks, and the results are compared to the state-of-the-art word embedding techniques. Our algorithm is very competitive to the state-of-the- arts on large corpora, while outperforms them by a significant margin when the training set is limited (i.e., sparse and noisy). With 17 million tokens, WordRank performs almost as well as existing methods using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node distributed implementation of WordRank is publicly available for general usage.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes