CLLGJun 13, 2021

Shape of Elephant: Study of Macro Properties of Word Embeddings Spaces

arXiv:2106.06964v1
Originality Incremental advance
AI Analysis

This addresses a fundamental gap in NLP for researchers and practitioners by providing insights into the macro properties of word embeddings, though it is incremental as it builds on existing embedding models.

The paper tackled the problem of understanding the global geometry of word embeddings by demonstrating that they form a high-dimensional simplex with interpretable vertices, and proposed a method to enumerate these vertices for GloVe and fastText spaces.

Pre-trained word representations became a key component in many NLP tasks. However, the global geometry of the word embeddings remains poorly understood. In this paper, we demonstrate that a typical word embeddings cloud is shaped as a high-dimensional simplex with interpretable vertices and propose a simple yet effective method for enumeration of these vertices. We show that the proposed method can detect and describe vertices of the simplex for GloVe and fasttext spaces.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes