Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

Eric Yocam, Varghese Vaidyan, Gurcan Comert, Paris Kalathas, Yong Wang, Judith L. Mwakalonge

arXiv:2603.10195v13.5h-index: 20

Predicted impact top 98% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the issue of generating factually incorrect text in LLMs, offering a surgical method that preserves model capabilities without fine-tuning, though it is incremental as it builds on existing activation analysis techniques.

The paper tackles the problem of factual hallucinations in large language models by proposing Adaptive Activation Cancellation (AAC), a real-time inference-time framework that suppresses hallucination-associated activations, resulting in improved accuracy on TruthfulQA and HaluEval across three model scales with zero degradation in perplexity and reasoning accuracy.

Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive Activation Cancellation (AAC), a real-time inference-time framework that treats hallucination-associated neural activations as structured interference within the transformer residual stream, drawing an explicit analogy to classical adaptive noise cancellation from signal processing. The framework identifies Hallucination Nodes (H-Nodes) via layer-wise linear probing and suppresses them using a confidence-weighted forward hook during auto-regressive generation -- requiring no external knowledge, no fine-tuning, and no additional inference passes. Evaluated across OPT-125M, Phi-3-mini, and LLaMA 3-8B on TruthfulQA and HaluEval, the real-time hook is the only intervention that consistently improves downstream accuracy on all three scales. Critically, the method is strictly surgical: WikiText-103 perplexity and MMLU reasoning accuracy are preserved at exactly 0.0% degradation across all three model scales, a property that distinguishes AAC from interventions that trade fluency or general capability for factual improvement. On the LLaMA 3-8B scale, the hook additionally yields positive generation-level gains (MC1 +0.04; MC2 +0.003; Token-F1 +0.003) while achieving probe-space selectivity 5.94x - 3.5x higher than the ITI baseline -- demonstrating that targeted neuron-level suppression can simultaneously improve factual accuracy and preserve model capability.

View on arXiv PDF

Similar