Banach-Tarski Embeddings and Transformers
This work offers a theoretical framework for understanding and improving transformer architectures in machine learning, potentially impacting natural language processing and other domains.
The authors introduced a new construction for embedding arbitrary recursive data structures into high-dimensional vectors, providing an interpretable model for transformer latent states, and demonstrated that these embeddings can be decoded back to the original structures and manipulated directly for computations.
We introduce a new construction of embeddings of arbitrary recursive data structures into high dimensional vectors. These embeddings provide an interpretable model for the latent state vectors of transformers. We demonstrate that these embeddings can be decoded to the original data structure when the embedding dimension is sufficiently large. This decoding algorithm has a natural implementation as a transformer. We also show that these embedding vectors can be manipulated directly to perform computations on the underlying data without decoding. As an example we present an algorithm that constructs the embedded parse tree of an embedded token sequence using only vector operations in embedding space.