CL AI LGJul 14, 2025

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

arXiv:2507.09875v24.92 citationsh-index: 13

Originality Incremental advance

AI Analysis

This provides insights into interpretability for AI researchers, though it is incremental as it builds on prior work on induction heads.

The study investigated how large language models generalize to unseen tasks like off-by-one addition, uncovering a function induction mechanism that explains their performance and showing it is reused in broader tasks such as shifted multiple-choice QA and base-8 addition.

Large language models demonstrate the intriguing ability to perform unseen tasks via in-context learning. However, it remains unclear what mechanisms inside the model drive such task-level generalization. In this work, we approach this question through the lens of off-by-one addition (i.e., 1+1=3, 2+2=5, 3+3=?), a two-step, counterfactual task with an unexpected +1 function as a second step. Leveraging circuit-style interpretability techniques such as path patching, we analyze the models' internal computations behind their performance and present three key findings. First, we uncover a function induction mechanism that explains the model's generalization from standard addition to off-by-one addition. This mechanism resembles the structure of the induction head mechanism found in prior work and elevates it to a higher level of abstraction. Second, we show that the induction of the +1 function is governed by multiple attention heads in parallel, each of which emits a distinct piece of the +1 function. Finally, we find that this function induction mechanism is reused in a broader range of tasks, including synthetic tasks such as shifted multiple-choice QA and algorithmic tasks such as base-8 addition. Overall, our findings offer deeper insights into how reusable and composable structures within language models enable task-level generalization.

View on arXiv PDF

Similar