CLMay 22, 2023

Discovering Universal Geometry in Embeddings with ICA

arXiv:2305.13175v2141 citations
Originality Highly original
AI Analysis

This provides a foundational insight into embedding representations, potentially benefiting researchers in machine learning and AI by enhancing understanding of geometric patterns.

The study used Independent Component Analysis (ICA) to reveal a consistent semantic structure in embeddings from pre-trained models, showing that embeddings can be expressed as a few interpretable axes that remain consistent across languages, algorithms, and modalities.

This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes