What the Vec? Towards Probabilistically Grounded Embeddings
This provides foundational insights for researchers in NLP and graph embedding, explaining the success of widely used algorithms, though it is incremental in clarifying existing methods.
The paper tackles the lack of theoretical understanding of why Word2Vec and GloVe embeddings work, showing that interactions between PMI vectors reflect semantic relationships like similarity and paraphrasing in low-dimensional embeddings under a suitable projection.
Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding algorithms. Their embeddings are widely used and perform well on a variety of natural language processing tasks. Moreover, W2V has recently been adopted in the field of graph embedding, where it underpins several leading algorithms. However, despite their ubiquity and relatively simple model architecture, a theoretical understanding of what the embedding parameters of W2V and GloVe learn and why that is useful in downstream tasks has been lacking. We show that different interactions between PMI vectors reflect semantic word relationships, such as similarity and paraphrasing, that are encoded in low dimensional word embeddings under a suitable projection, theoretically explaining why embeddings of W2V and GloVe work. As a consequence, we also reveal an interesting mathematical interconnection between the considered semantic relationships themselves.