CLJun 18, 2018

SubGram: Extending Skip-gram Word Representation with Substrings

arXiv:1806.06571v14 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better word embeddings in natural language processing, though it appears incremental as it refines an existing method.

The authors tackled the problem of improving word vector representations by extending the Skip-gram model to incorporate word structure, achieving large gains on the original test set.

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network. The representation gained popularity in various areas of natural language processing, because it seems to capture syntactic and semantic information about words without any explicit supervision in this respect. We propose SubGram, a refinement of the Skip-gram model to consider also the word structure during the training process, achieving large gains on the Skip-gram original test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes