CLAIJan 23

Jacobian Scopes: token-level causal attributions in LLMs

arXiv:2601.16407v12 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of interpretability in LLMs for researchers and practitioners, but it is incremental as it builds on existing gradient-based attribution methods with new variants.

The authors tackled the challenge of interpreting which prior tokens most strongly influence predictions in large language models by proposing Jacobian Scopes, a suite of gradient-based, token-level causal attribution methods, and applied them to case studies like instruction understanding and translation, uncovering findings such as implicit political biases.

Large language models (LLMs) make next-token predictions based on clues present in their context, such as semantic descriptions and in-context examples. Yet, elucidating which prior tokens most strongly influence a given prediction remains challenging due to the proliferation of layers and attention heads in modern architectures. We propose Jacobian Scopes, a suite of gradient-based, token-level causal attribution methods for interpreting LLM predictions. By analyzing the linearized relations of final hidden state with respect to inputs, Jacobian Scopes quantify how input tokens influence a model's prediction. We introduce three variants - Semantic, Fisher, and Temperature Scopes - which respectively target sensitivity of specific logits, the full predictive distribution, and model confidence (inverse temperature). Through case studies spanning instruction understanding, translation and in-context learning (ICL), we uncover interesting findings, such as when Jacobian Scopes point to implicit political biases. We believe that our proposed methods also shed light on recently debated mechanisms underlying in-context time-series forecasting. Our code and interactive demonstrations are publicly available at https://github.com/AntonioLiu97/JacobianScopes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes