CLAIDec 11, 2025

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

arXiv:2512.10791v111 citationsh-index: 85
Originality Synthesis-oriented
AI Analysis

This addresses the problem of assessing factual accuracy in AI-generated text for researchers and developers, though it is incremental as it builds on existing benchmarking approaches.

The paper introduces The FACTS Leaderboard, a benchmark suite that evaluates large language models' factuality across four diverse scenarios, providing a holistic measure through automated scoring and averaging of sub-leaderboard performances.

We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate text across diverse scenarios. The suite provides a holistic measure of factuality by aggregating the performance of models on four distinct sub-leaderboards: (1) FACTS Multimodal, which measures the factuality of responses to image-based questions; (2) FACTS Parametric, which assesses models' world knowledge by answering closed-book factoid questions from internal parameters; (3) FACTS Search, which evaluates factuality in information-seeking scenarios, where the model must use a search API; and (4) FACTS Grounding (v2), which evaluates whether long-form responses are grounded in provided documents, featuring significantly improved judge models. Each sub-leaderboard employs automated judge models to score model responses, and the final suite score is an average of the four components, designed to provide a robust and balanced assessment of a model's overall factuality. The FACTS Leaderboard Suite will be actively maintained, containing both public and private splits to allow for external participation while guarding its integrity. It can be found at https://www.kaggle.com/benchmarks/google/facts .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes