Discovering Universal Geometry in Embeddings with ICA
This provides a foundational insight into embedding representations, potentially benefiting researchers in machine learning and AI by enhancing understanding of geometric patterns.
The study used Independent Component Analysis (ICA) to reveal a consistent semantic structure in embeddings from pre-trained models, showing that embeddings can be expressed as a few interpretable axes that remain consistent across languages, algorithms, and modalities.
This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.