LGAICLNov 11, 2024

Universal Response and Emergence of Induction in LLMs

arXiv:2411.07071v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of circuit decomposition for induction in LLMs, providing insights for researchers in interpretability and AI safety, though it is incremental as it builds on existing probing methods.

The study tackled the problem of understanding induction behavior in large language models (LLMs) by probing their response to token perturbations, revealing that LLMs exhibit a universal, scale-invariant regime and identifying induction signatures in models like Gemma-2-2B, Llama-3.2-3B, and GPT-2-XL, with these signatures emerging gradually in intermediate layers.

While induction is considered a key mechanism for in-context learning in LLMs, understanding its precise circuit decomposition beyond toy models remains elusive. Here, we study the emergence of induction behavior within LLMs by probing their response to weak single-token perturbations of the residual stream. We find that LLMs exhibit a robust, universal regime in which their response remains scale-invariant under changes in perturbation strength, thereby allowing us to quantify the build-up of token correlations throughout the model. By applying our method, we observe signatures of induction behavior within the residual stream of Gemma-2-2B, Llama-3.2-3B, and GPT-2-XL. Across all models, we find that these induction signatures gradually emerge within intermediate layers and identify the relevant model sections composing this behavior. Our results provide insights into the collective interplay of components within LLMs and serve as a benchmark for large-scale circuit analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes