Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations

arXiv:2604.0474367.61 citationsHas Code

AI Analysis

This work addresses the issue of factual inaccuracies in LLM outputs for users relying on reliable text generation, though it is incremental in building on existing dynamical systems frameworks.

The authors tackled the problem of LLM hallucinations by analyzing them as geometric basins in latent space, finding that hallucination separability varies by task and demonstrating that geometry-aware steering reduces hallucination probability without retraining.

Large language models (LLMs) hallucinate: they produce fluent outputs that are factually incorrect. We present a geometric dynamical systems framework in which hallucinations arise from task-dependent basin structure in latent space. Using autoregressive hidden-state trajectories across multiple open-source models and benchmarks, we find that separability is strongly task-dependent rather than universal: factoid settings can show clearer basin separation, whereas summarization and misconception-heavy settings are typically less stable and often overlap. We formalize this behavior with task-complexity and multi-basin theorems, characterize basin emergence in L-layer transformers, and show that geometry-aware steering can reduce hallucination probability without retraining.

View on arXiv PDF

Similar