CLAILGMLOct 30, 2018

Word Mover's Embedding: From Word2Vec to Document Embedding

arXiv:1811.01713v11127 citations
Originality Highly original
AI Analysis

This addresses the challenge of creating effective document embeddings for text classification and similarity tasks, particularly benefiting applications with short texts, and represents a novel extension rather than an incremental improvement.

The paper tackled the problem of generating unsupervised document embeddings from word embeddings, proposing Word Mover's Embedding (WME) as a solution. The result showed that WME consistently matched or outperformed state-of-the-art techniques on 9 benchmark text classification datasets and 22 textual similarity tasks, with significantly higher accuracy on short-length problems.

While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called \emph{Word Mover's Distance} (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the \emph{Word Mover's Embedding } (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes