CLFeb 6, 2016

Swivel: Improving Embeddings by Noticing What's Missing

arXiv:1602.02215v181 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient and accurate embedding generation for natural language processing and related fields, representing an incremental improvement over existing factorization methods.

The authors tackled the problem of generating low-dimensional feature embeddings from co-occurrence matrices by developing Swivel, a method that performs approximate factorization of point-wise mutual information matrices using stochastic gradient descent with special handling for unobserved co-occurrences. This approach resulted in more accurate embeddings than methods considering only observed co-occurrences and enabled scaling to larger corpora than sampling methods.

We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating low-dimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information matrix via stochastic gradient descent. It uses a piecewise loss with special handling for unobserved co-occurrences, and thus makes use of all the information in the matrix. While this requires computation proportional to the size of the entire matrix, we make use of vectorized multiplication to process thousands of rows and columns at once to compute millions of predicted values. Furthermore, we partition the matrix into shards in order to parallelize the computation across many nodes. This approach results in more accurate embeddings than can be achieved with methods that consider only observed co-occurrences, and can scale to much larger corpora than can be handled with sampling methods.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes