CLIRLGAug 31, 2016

Hash2Vec, Feature Hashing for Word Embeddings

arXiv:1608.08940v113 citations
Originality Incremental advance
AI Analysis

This provides a scalable technique for NLP applications, though it is incremental as it adapts an existing method to a new task.

The paper tackles the problem of creating word embeddings for natural language processing by applying feature hashing, resulting in a scalable algorithm that captures semantic meaning similar to GloVe without needing training.

In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes