LGCLFeb 7, 2025

Position-aware Automatic Circuit Discovery

arXiv:2502.04577v112 citationsh-index: 55ACL
Originality Incremental advance
AI Analysis

This addresses a gap in interpretability methods for language models, enabling more accurate analysis of position-dependent mechanisms, though it is an incremental improvement on existing circuit discovery techniques.

The paper tackles the limitation of existing circuit discovery methods that assume position-invariance, which hinders their ability to capture mechanisms varying across input positions. The authors propose position-aware methods including extended edge attribution patching and dataset schemas, achieving better trade-offs between circuit size and faithfulness compared to prior work.

A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware circuit discovery in datasets with variable length examples. We additionally develop an automated pipeline for schema generation and application using large language models. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes