Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering
This addresses the challenge of achieving reliable reasoning performance in LLMs for practical applications like mathematics and coding, offering an efficient alternative to post-training or expensive sampling methods.
The paper tackled the problem of unreliable reasoning in large language models by proposing AdaRAS, a lightweight test-time framework that selectively intervenes on neuron activations, resulting in consistent improvements including over 13% gains on AIME-24 and AIME-25 benchmarks.
Despite the strong reasoning capabilities of recent large language models (LLMs), achieving reliable performance on challenging tasks often requires post-training or computationally expensive sampling strategies, limiting their practical efficiency. In this work, we first show that a small subset of neurons in LLMs exhibits strong predictive correlations with reasoning correctness. Based on this observation, we propose AdaRAS (Adaptive Reasoning Activation Steering), a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations. AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference, enhancing incorrect reasoning traces while avoiding degradation on already-correct cases. Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25. Moreover, AdaRAS exhibits strong transferability across datasets and scalability to stronger models, outperforming post-training methods without additional training or sampling cost.