CLAIMay 23, 2017

Second-Order Word Embeddings from Nearest Neighbor Topological Features

arXiv:1705.08488v17 citations
Originality Incremental advance
AI Analysis

This work addresses performance and data heterogeneity issues in NLP for researchers and practitioners, though it is incremental as it builds on existing embedding methods.

The paper tackled the problem of improving NLP tasks by introducing second-order word embeddings derived from nearest neighbor topological features in pre-trained embeddings, finding that these embeddings capture most performance benefits and handle heterogeneous data better, with improvements in named entity recognition, textual entailment, and paraphrase recognition.

We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recognition. Surprisingly, we find that nearest neighbor information alone is sufficient to capture most of the performance benefits derived from using pre-trained word embeddings. Furthermore, second-order embeddings are able to handle highly heterogeneous data better than first-order representations, though at the cost of some specificity. Additionally, augmenting contextual embeddings with second-order information further improves model performance in some cases. Due to variance in the random initializations of word embeddings, utilizing nearest neighbor features from multiple first-order embedding samples can also contribute to downstream performance gains. Finally, we identify intriguing characteristics of second-order embedding spaces for further research, including much higher density and different semantic interpretations of cosine similarity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes