IRAIFeb 18

Retrieval Collapses When AI Pollutes the Web

arXiv:2602.16136v1h-index: 6
AI Analysis

This addresses a structural risk for information retrieval systems like search engines and RAG, potentially affecting all users reliant on web-grounded AI, and is incremental as it characterizes an emerging failure mode with experimental evidence.

The paper tackles the problem of AI-generated content polluting the web, leading to retrieval collapse where search engines and RAG systems rely on synthetic sources, with experiments showing that 67% pool contamination results in over 80% exposure contamination in SEO scenarios and up to 19% harmful content exposure in adversarial cases.

The rapid proliferation of AI-generated content on the Web presents a structural risk to information retrieval, as search engines and Retrieval-Augmented Generation (RAG) systems increasingly consume evidence produced by the Large Language Models (LLMs). We characterize this ecosystem-level failure mode as Retrieval Collapse, a two-stage process where (1) AI-generated content dominates search results, eroding source diversity, and (2) low-quality or adversarial content infiltrates the retrieval pipeline. We analyzed this dynamic through controlled experiments involving both high-quality SEO-style content and adversarially crafted content. In the SEO scenario, a 67\% pool contamination led to over 80\% exposure contamination, creating a homogenized yet deceptively healthy state where answer accuracy remains stable despite the reliance on synthetic sources. Conversely, under adversarial contamination, baselines like BM25 exposed $\sim$19\% of harmful content, whereas LLM-based rankers demonstrated stronger suppression capabilities. These findings highlight the risk of retrieval pipelines quietly shifting toward synthetic evidence and the need for retrieval-aware strategies to prevent a self-reinforcing cycle of quality decline in Web-grounded systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes