CLDBMar 25

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

arXiv:2603.2408031.81 citationsh-index: 10
Predicted impact top 25% in CL · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of overestimating LLM factuality in benchmarks for researchers and practitioners, highlighting significant gaps in knowledge coverage and accuracy.

The paper tackles the incomplete picture of LLM factuality by introducing LLMpedia, a framework that generates encyclopedic articles from parametric memory, revealing a true rate of only 74.7% on Wikipedia-covered subjects for gpt-5-mini, which is over 15 percentage points below benchmark estimates, and further drops to 63.2% for frontier subjects.

Benchmarks such as MMLU suggest flagship language models approach factuality saturation, with scores above 90\%. We show this picture is incomplete. \emph{LLMpedia} generates encyclopedic articles entirely from parametric memory, producing ${\sim}$1M articles across three model families without retrieval. For gpt-5-mini, the verifiable true rate on Wikipedia-covered subjects is only 74.7\% -- more than 15 percentage points below the benchmark-based picture, consistent with the availability bias of fixed-question evaluation. Beyond Wikipedia, frontier subjects verifiable only through curated web evidence fall further to 63.2\% true rate. Wikipedia covers just 61\% of surfaced subjects, and three model families overlap by only 7.3\% in subject choice. In a capture-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia achieves substantially higher factuality at roughly half the textual similarity to Wikipedia. Unlike Grokipedia, every prompt, artifact, and evaluation verdict is publicly released, making LLMpedia the first fully open parametric encyclopedia -- bridging factuality evaluation and knowledge materialization. All data, code, and a browsable interface are at https://llmpedia.net.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes