CLMay 22, 2023

Discovering Universal Geometry in Embeddings with ICA

Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira

arXiv:2305.13175v222.4141 citationsHas Code

Originality Highly original

AI Analysis

This provides a foundational insight into embedding representations, potentially benefiting researchers in machine learning and AI by enhancing understanding of geometric patterns.

The study used Independent Component Analysis (ICA) to reveal a consistent semantic structure in embeddings from pre-trained models, showing that embeddings can be expressed as a few interpretable axes that remain consistent across languages, algorithms, and modalities.

This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.

View on arXiv PDF Code

Similar