The Future of Facts: Tracing the Factual Generation-Verification Gap

Tim R. Davidson, Anja Surina, Caglar Gulcehre

arXiv:2605.2756493.0h-index: 10Has Code

Predicted impact top 20% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a systematic understanding of the dynamics between factual generation and verification in language models, which is crucial for improving self-improvement and reasoning capabilities.

The paper identifies a generation-verification gap in language models, where verification of factual knowledge is learned earlier and more robustly than generation. Across multiple models, verification precedes generation, is more resilient to continual learning, and factual updates can create a 'multi-verse' state where both old and new answers are verified as correct.

Language models are becoming the default interface to factual knowledge, yet they often verify outputs more reliably than they generate them. This generation-verification gap (GV-gap) underlies many recent advances in self-improvement and reasoning, but its dynamics on factual knowledge specifically remain poorly understood. We focus on the training mechanisms underlying factual GV-gaps, distinguishing them from their computational and aesthetic counterparts. We trace generation and verification capabilities through three training phases (acquisition, continual learning, and updating) across four open-source model families at two scales each. Three findings recur across models: (i) verification is consistently learned before generation; (ii) verification is more robust to continual learning than generation; and (iii) factual updates can leave models in a "multi-verse" state, simultaneously verifying both old and new answers as correct. Natural experiments on frontier models reproduce these dynamics at scale and reveal residual verification biases on well-covered facts.

View on arXiv PDF

Similar