CLOct 10, 2023

Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings

arXiv:2310.06977v121.2133 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the relevance of geometric interpretations in Transformer models for researchers in NLP and machine learning, highlighting incremental insights into model-specific characteristics.

The study investigated whether linear decompositions of Transformer embeddings are empirically meaningful by analyzing machine-translation decoders, finding that decomposition-derived indicators correlate with model performance but show high variability across runs.

A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question. The high variability of our measurements indicate that geometry reflects model-specific characteristics more than it does sentence-specific computations, and that similar training conditions do not guarantee similar vector spaces.

View on arXiv PDF Code

Similar