SEAIDec 23, 2025

Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?

arXiv:2512.19980v12 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses interpretability for code LLMs, which is incremental as it adapts neuron analysis from NLP to the unique challenges of programming languages.

The paper tackled the problem of interpreting code language models by localizing language-specific neurons and concept layers, finding that lower layers encode syntax while middle layers capture semantic abstractions across languages, and demonstrated utility in tasks like fine-tuning and clone detection with consistent gains.

Code language models excel on code intelligence tasks, yet their internal interpretability is underexplored. Existing neuron interpretability techniques from NLP are suboptimal for source code due to programming languages formal, hierarchical, and executable nature. We empirically investigate code LLMs at the neuron level, localizing language-specific neurons (selectively responsive to one language) and concept layers (feed-forward layers encoding language-agnostic code representations). We analyze Llama-3.1-8B and Qwen2.5-Coder-32B on multilingual inputs in C++, Java, Python, Go, and JavaScript, measuring neuron selectivity and layerwise contributions during generation. We find (1) neurons specialized for individual languages alongside a universal subset supporting general-purpose generation; and (2) lower layers mainly encode language-specific syntax, while middle layers capture semantic abstractions shared across languages, emerging as concept layers. We demonstrate utility on three tasks: neuron-guided fine-tuning for code generation, clone detection via concept-layer embeddings, and concept-layer-guided transfer for code summarization, each yielding consistent gains in multilingual settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes