CLJan 27

Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering

arXiv:2601.19847v11 citationsh-index: 39
Originality Highly original
AI Analysis

This addresses the challenge of achieving reliable reasoning performance in LLMs for practical applications like mathematics and coding, offering an efficient alternative to post-training or expensive sampling methods.

The paper tackled the problem of unreliable reasoning in large language models by proposing AdaRAS, a lightweight test-time framework that selectively intervenes on neuron activations, resulting in consistent improvements including over 13% gains on AIME-24 and AIME-25 benchmarks.

Despite the strong reasoning capabilities of recent large language models (LLMs), achieving reliable performance on challenging tasks often requires post-training or computationally expensive sampling strategies, limiting their practical efficiency. In this work, we first show that a small subset of neurons in LLMs exhibits strong predictive correlations with reasoning correctness. Based on this observation, we propose AdaRAS (Adaptive Reasoning Activation Steering), a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations. AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference, enhancing incorrect reasoning traces while avoiding degradation on already-correct cases. Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25. Moreover, AdaRAS exhibits strong transferability across datasets and scalability to stronger models, outperforming post-training methods without additional training or sampling cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes