CLLGMay 18, 2025

$K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks

arXiv:2505.12268v21 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of interpretability in large language models for researchers and practitioners by providing a method to uncover minimally sufficient neural components, though it is incremental as it builds on existing circuit analysis techniques.

The authors tackled the problem of identifying which attention heads in mid-sized language models are crucial for specific classification tasks, introducing the K-MSHC methodology and Search-K-MSHC algorithm, and applied it to Gemma-9B to reveal distinct, task-specific head circuits with patterns like early layer usage for grammar tasks and distributed activity for arithmetic verification.

Understanding which neural components drive specific capabilities in mid-sized language models ($\leq$10B parameters) remains a key challenge. We introduce the $(\bm{K}, ε)$-Minimum Sufficient Head Circuit ($K$-MSHC), a methodology to identify minimal sets of attention heads crucial for classification tasks as well as Search-K-MSHC, an efficient algorithm for discovering these circuits. Applying our Search-K-MSHC algorithm to Gemma-9B, we analyze three syntactic task families: grammar acceptability, arithmetic verification, and arithmetic word problems. Our findings reveal distinct task-specific head circuits, with grammar tasks predominantly utilizing early layers, word problems showing pronounced activity in both shallow and deep regions, and arithmetic verification demonstrating a more distributed pattern across the network. We discover non-linear circuit overlap patterns, where different task pairs share computational components at varying levels of importance. While grammar and arithmetic share many "weak" heads, arithmetic and word problems share more consistently critical "strong" heads. Importantly, we find that each task maintains dedicated "super-heads" with minimal cross-task overlap, suggesting that syntactic and numerical competencies emerge from specialized yet partially reusable head circuits.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes