CLMay 11, 2017

Sketching Word Vectors Through Hashing

arXiv:1705.04253v2
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in natural language processing for researchers and practitioners, but it is incremental as it builds on random projection-based techniques.

The authors tackled the problem of computational complexity in word embedding techniques by proposing a fast method using hash functions and sparse non-negative random projections, achieving competitive results comparable to neural embeddings with significantly reduced computational cost.

We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections: By disregarding the classic constraint used in designing random projections (i.e., preserving pairwise distances in a particular normed space), our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed method can achieve competitive results, comparable to neural embedding learning techniques, however, with only a fraction of the computational complexity of these methods. While the proposed derandomization enhances the computational and space complexity of our method, the possibility of applying weighting methods such as positive pointwise mutual information (PPMI) to our models after their construction (and at a reduced dimensionality) imparts a high discriminatory power to the resulting embeddings. Obviously, this method comes with other known benefits of random projection-based techniques such as ease of update.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes