CL AI LGOct 31, 2024

Interpretable Next-token Prediction via the Generalized Induction Head

Eunji Kim, Sriya Mantena, Weiwei Yang, Chandan Singh, Sungroh Yoon, Jianfeng Gao

arXiv:2411.00066v22.72 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This addresses the need for interpretable models in high-stakes domains like language processing and neuroscience, representing an incremental advance in bridging interpretability and performance.

The paper tackles the problem of interpretability in next-token prediction by proposing the Generalized Induction-Head Model (GIM), which improves performance by up to 25%p over interpretable baselines in language modeling and by 20% in fMRI response prediction.

While large transformer models excel in predictive performance, their lack of interpretability restricts their usefulness in high-stakes domains. To remedy this, we propose the Generalized Induction-Head Model (GIM), an interpretable model for next-token prediction inspired by the observation of "induction heads" in LLMs. GIM is a retrieval-based module that identifies similar sequences in the input context by combining exact n-gram matching and fuzzy matching based on a neural similarity metric. We evaluate GIM in two settings: language modeling and fMRI response prediction. In language modeling, GIM improves next-token prediction by up to 25%p over interpretable baselines, significantly narrowing the gap with black-box LLMs. In an fMRI setting, GIM improves neural response prediction by 20% and offers insights into the language selectivity of the brain. GIM represents a significant step toward uniting interpretability and performance across domains. The code is available at https://github.com/ejkim47/generalized-induction-head.

View on arXiv PDF Code

Similar