CLDec 20, 2024

Can Input Attributions Explain Inductive Reasoning in In-Context Learning?

arXiv:2412.15628v54 citationsh-index: 15ACL
Originality Incremental advance
AI Analysis

This addresses the challenge of explaining reasoning processes in large language models for researchers in interpretability, but it is incremental as it builds on existing attribution methods.

The paper tackles the problem of interpreting which examples in few-shot in-context learning contribute to task identification by designing synthetic diagnostic tasks of inductive reasoning, finding that a simple input attribution method works best and gradient-based methods become less effective with larger models.

Interpreting the internal process of neural models has long been a challenge. This challenge remains relevant in the era of large language models (LLMs) and in-context learning (ICL); for example, ICL poses a new issue of interpreting which example in the few-shot examples contributed to identifying/solving the task. To this end, in this paper, we design synthetic diagnostic tasks of inductive reasoning, inspired by the generalization tests typically adopted in psycholinguistics. Here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates it. The question is whether conventional input attribution (IA) methods can track such a reasoning process, i.e., identify the influential example, in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the generally harder it is to interpret the ICL with gradient-based IA methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes