Diagnosing and Repairing Citation Failures in Generative Engine Optimization
This addresses the issue of equitable visibility in AI-mediated information access, particularly for long-tail content, though it is an incremental improvement over existing GEO methods.
The paper tackles the problem of improving citation rates in Generative Engine Optimization (GEO) by introducing a diagnostic approach that identifies why documents fail to be cited and applies targeted repairs, achieving over 40% relative improvement in citation rates while modifying only 5% of content.
Generative Engine Optimization (GEO) aims to improve content visibility in AI-generated responses. However, existing methods measure contribution-how much a document influences a response-rather than citation, the mechanism that actually drives traffic back to creators. Also, these methods apply generic rewriting rules uniformly, failing to diagnose why individual document are not cited. This paper introduces a diagnostic approach to GEO that asks why a document fails to be cited and intervenes accordingly. We develop a unified framework comprising: (1) the first taxonomy of citation failure modes spanning different stages of a citation pipeline; (2) AgentGEO, an agentic system that diagnoses failures using this taxonomy, selects targeted repairs from a corresponding tool library, and iterates until citation is achieved; and (3) a document-centric benchmark evaluating whether optimizations generalize across held-out queries. AgentGEO achieves over 40% relative improvement in citation rates while modifying only 5% of content, compared to 25% for baselines. Our analysis reveals that generic optimization can harm long-tail content and some documents face challenges that optimization alone cannot fully address-findings with implications for equitable visibility in AI-mediated information access.