Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions
This work addresses the problem of understanding arithmetic computation in large language models for researchers in interpretability and AI safety, but it is incremental as it builds on existing mechanistic analysis methods.
The study investigated how Meta-Llama-3-8B finalizes three-digit addition answers after cross-token routing becomes irrelevant, finding that beyond layer 17, the decoded sum is controlled by the last input token and late-layer self-attention is largely dispensable. It revealed that digit direction dictionaries vary with context but are related by an orthogonal map in a low-rank subspace, enabling successful counterfactual edits through rotation.
We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary near layer~17: beyond it, the decoded sum is controlled almost entirely by the last input token and late-layer self-attention is largely dispensable. In this post-routing regime, digit(-sum) direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace (low-rank Procrustes alignment). Causal digit editing matches this geometry: naive cross-context transfer fails, while rotating directions through the learned map restores strict counterfactual edits; negative controls do not recover.