LGAICLOct 23, 2024

ALTA: Compiler-Based Analysis of Transformers

DeepMind
arXiv:2410.18077v28 citationsh-index: 59Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work provides tools for analyzing and training Transformers, benefiting researchers in interpretability and algorithm design, though it builds incrementally on prior languages like RASP and Tracr.

The authors tackled the problem of understanding and constructing Transformers by proposing ALTA, a compiler-based programming language that maps programs to Transformer weights, enabling constructive demonstrations of length-invariant algorithms for parity and addition and a solution to the SCAN benchmark without intermediate steps.

We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler from RASP programs to Transformer weights. ALTA complements and extends this prior work, offering the ability to express loops and to compile programs to Universal Transformers, among other advantages. ALTA allows us to constructively show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks, without requiring intermediate scratchpad decoding steps. We also propose tools to analyze cases where the expressibility of an algorithm is established, but end-to-end training on a given training set fails to induce behavior consistent with the desired algorithm. To this end, we explore training from ALTA execution traces as a more fine-grained supervision signal. This enables additional experiments and theoretical analyses relating the learnability of various algorithms to data availability and modeling decisions, such as positional encodings. We make the ALTA framework -- language specification, symbolic interpreter, and weight compiler -- available to the community to enable further applications and insights.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes