CL AINov 12, 2020

Deconstructing word embedding algorithms

Kian Kenyon-Dean, Edward Newell, Jackie Chi Kit Cheung

arXiv:2011.07013v131.1993 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental analysis that offers a retrospective on established algorithms, potentially aiding researchers in designing more efficient embeddings for NLP tasks in resource-limited settings.

The paper deconstructs Word2vec, GloVe, and other word embedding algorithms into a common form to identify conditions required for high performance, providing theoretical insights for future model development.

Word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Uncontextualized word embeddings are used in many NLP tasks today, especially in resource-limited settings where high memory capacity and GPUs are not available. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings. We believe that the theoretical findings in this paper can provide a basis for more informed development of future models.

View on arXiv PDF

Similar