Shape of Elephant: Study of Macro Properties of Word Embeddings Spaces
This addresses a fundamental gap in NLP for researchers and practitioners by providing insights into the macro properties of word embeddings, though it is incremental as it builds on existing embedding models.
The paper tackled the problem of understanding the global geometry of word embeddings by demonstrating that they form a high-dimensional simplex with interpretable vertices, and proposed a method to enumerate these vertices for GloVe and fastText spaces.
Pre-trained word representations became a key component in many NLP tasks. However, the global geometry of the word embeddings remains poorly understood. In this paper, we demonstrate that a typical word embeddings cloud is shaped as a high-dimensional simplex with interpretable vertices and propose a simple yet effective method for enumeration of these vertices. We show that the proposed method can detect and describe vertices of the simplex for GloVe and fasttext spaces.