LGAIApr 26

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

arXiv:2604.2375037.31 citations
Predicted impact top 63% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners using hypernetwork-based adaptation for LLMs, this work identifies and mitigates a critical failure mode in knowledge conflicts, offering a practical, training-free solution.

Hypernetwork-based instant LLM adaptation fails systematically on knowledge conflicts, with accuracy dropping to 46.4% on deep conflicts. The authors identify this as a magnitude problem and propose training-free methods (Selective Layer Boosting and Conflict-Aware Internalization) that raise deep-conflict accuracy to 71.0% on Gemma-2B and 72.5% on Mistral-7B, outperforming retrieval-augmented generation on medium conflicts by 18 percentage points.

Hypernetwork-based methods such as Doc-to-LoRA internalize a document into an LLM's weights in a single forward pass, but they fail systematically on conflicts: when the document contradicts pretraining knowledge, accuracy collapses to 46.4% on the deepest facts. We show the failure is a magnitude problem rather than a representational one. The hypernetwork already targets the right layers, but its adapter margin is approximately constant across documents while the pretrained margin grows with training frequency, so deep conflicts lose by construction. The account predicts that failure should track prior strength: sorting 194 conflicts by the base model's log-probability on the contradicted fact, baseline accuracy falls from 68% on weak-prior questions to 16% on strong-prior ones, a 52 percentage-point gap. The cure is amplitude. Selective Layer Boosting scales the adapter at its top-norm layers, and Conflict-Aware Internalization triggers boosting only when the base model is confident. Both are training-free; together they raise deep-conflict accuracy from 46.4% to 71.0% on Gemma-2B and from 53.6% to 72.5% on Mistral-7B while preserving novel-knowledge recall, and beat vanilla retrieval-augmented generation on medium conflicts by 18 percentage points despite operating entirely in parameter space. We release KID-Bench, a 489-question benchmark that separates novel recall, cross-knowledge combination, and prior-graded conflicts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes