Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
This work addresses the problem of understanding internal mechanisms for error correction in LLMs, which is incremental as it builds on existing knowledge of neuron and head functions.
The paper investigates how transformer-based LLMs encode inputs with typographical errors, finding that specific neurons and attention heads recognize and fix typos using local and global contexts, with neurons in middle layers handling core typo-fixing and heads considering broad context.
This paper investigates how LLMs encode inputs with typos. We hypothesize that specific neurons and attention heads recognize typos and fix them internally using local and global contexts. We introduce a method to identify typo neurons and typo heads that work actively when inputs contain typos. Our experimental results suggest the following: 1) LLMs can fix typos with local contexts when the typo neurons in either the early or late layers are activated, even if those in the other are not. 2) Typo neurons in the middle layers are responsible for the core of typo-fixing with global contexts. 3) Typo heads fix typos by widely considering the context not focusing on specific tokens. 4) Typo neurons and typo heads work not only for typo-fixing but also for understanding general contexts.