CYAIApr 11, 2025

Hallucination, reliability, and the role of generative AI in science

arXiv:2504.08526v111 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of ensuring generative AI's reliability in scientific domains, offering a conceptual framework to distinguish harmful errors, though it is incremental in refining existing approaches.

The paper tackles the problem of generative AI producing plausible but incorrect outputs (hallucinations) in scientific applications, arguing that not all hallucinations are equally harmful and that workflows like AlphaFold and GenCast can mitigate the most damaging ones to ensure reliability.

Generative AI is increasingly used in scientific domains, from protein folding to climate modeling. But these models produce distinctive errors known as hallucinations - outputs that are incorrect yet superficially plausible. Worse, some arguments suggest that hallucinations are an inevitable consequence of the mechanisms underlying generative inference. Fortunately, such arguments rely on a conception of hallucination defined solely with respect to internal properties of the model, rather than in reference to the empirical target system. This conception fails to distinguish epistemically benign errors from those that threaten scientific inference. I introduce the concept of corrosive hallucination to capture the epistemically troubling subclass: misrepresentations that are substantively misleading and resistant to systematic anticipation. I argue that although corrosive hallucinations do pose a threat to scientific reliability, they are not inevitable. Scientific workflows such as those surrounding AlphaFold and GenCast, both of which serve as case studies, can neutralize their effects by imposing theoretical constraints during training, and by strategically screening for errors at inference time. When embedded in such workflows, generative AI can reliably contribute to scientific knowledge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes