LGAIMar 3

Understanding the Dynamics of Demonstration Conflict in In-Context Learning

arXiv:2603.04464v12 citations
Originality Highly original
AI Analysis

This research addresses the problem of demonstration conflict in in-context learning for developers and users of large language models, providing an incremental understanding of the underlying mechanisms and potential solutions.

The researchers investigated how large language models process conflicting demonstrations in in-context learning, finding that a single corrupted demonstration can cause substantial performance degradation, but targeted ablation of specific attention heads can improve performance by over 10%. The study reveals a two-phase computational structure in how models encode and predict rules.

In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts, we study demonstration-dependent tasks requiring models to infer underlying patterns, a process we characterize as rule inference. We find that models suffer substantial performance degradation from a single demonstration with corrupted rule. This systematic misleading behavior motivates our investigation of how models process conflicting evidence internally. Using linear probes and logit lens analysis, we discover that under corruption models encode both correct and incorrect rules in intermediate layers but develop prediction confidence only in late layers, revealing a two-phase computational structure. We then identify attention heads for each phase underlying the reasoning failures: Vulnerability Heads in early-to-middle layers exhibit positional attention bias with high sensitivity to corruption, while Susceptible Heads in late layers significantly reduce support for correct predictions when exposed to the corrupted evidence. Targeted ablation validates our findings, with masking a small number of identified heads improving performance by over 10%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes