Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency
Provides a predictive scaling law for factual recall, benefiting model developers and users by linking performance to training data composition.
The paper shows that factual recall in LLMs follows a sigmoid scaling law based on model size and topic frequency in training data, explaining 60-94% of variance across 38 models and 8,900 references.
While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a sigmoid in the log-linear combination of model parameter count and topic representation in training data. These two variables alone explain 60% of the variance across 16 dense models from four families, rising to 74-94% within individual families. The form matches a superposition-inspired account in which recall is gated by a signal-to-noise ratio: signal strength scales with concept frequency and the noise floor with model capacity.