CLAILGJul 16, 2025

Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models

arXiv:2507.11809v1Trans. Mach. Learn. Res.
Originality Synthesis-oriented
AI Analysis

This addresses the problem of understanding internal mechanisms in LLMs for researchers, but it is incremental as it reproduces and reconciles prior studies.

The study investigated how Large Language Models handle competing factual and counterfactual information, finding that attention heads promote factual output through general copy suppression rather than selective counterfactual suppression, and their behavior is domain-dependent with larger models showing more specialized patterns.

This paper presents a reproducibility study examining how Large Language Models (LLMs) manage competing factual and counterfactual information, focusing on the role of attention heads in this process. We attempt to reproduce and reconcile findings from three recent studies by Ortu et al., Yu, Merullo, and Pavlick and McDougall et al. that investigate the competition between model-learned facts and contradictory context information through Mechanistic Interpretability tools. Our study specifically examines the relationship between attention head strength and factual output ratios, evaluates competing hypotheses about attention heads' suppression mechanisms, and investigates the domain specificity of these attention patterns. Our findings suggest that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression, as strengthening them can also inhibit correct facts. Additionally, we show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes