CLJan 21

ClaimDB: A Fact Verification Benchmark over Large Structured Data

Michael Theologitis, Preetam Prabhu Srikar Dammu, Chirag Shah, Dan Suciu

arXiv:2601.14698v11.62 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of reliable fact verification in high-stakes domains like healthcare and governance, though it is incremental as it builds on existing benchmarks.

The authors tackled the problem of fact verification over large structured data by introducing ClaimDB, a benchmark with evidence from millions of records across multiple tables, and found that state-of-the-art LLMs achieve at most 83% accuracy, with many below 55%.

Despite substantial progress in fact-verification benchmarks, claims grounded in large-scale structured data remain underexplored. In this work, we introduce ClaimDB, the first fact-verification benchmark where the evidence for claims is derived from compositions of millions of records and multiple tables. ClaimDB consists of 80 unique real-life databases covering a wide range of domains, from governance and healthcare to media, education and the natural sciences. At this scale, verification approaches that rely on "reading" the evidence break down, forcing a timely shift toward reasoning in executable programs. We conduct extensive experiments with 30 state-of-the-art proprietary and open-source (below 70B) LLMs and find that none exceed 83% accuracy, with more than half below 55%. Our analysis also reveals that both closed- and open-source models struggle with abstention -- the ability to admit that there is no evidence to decide -- raising doubts about their reliability in high-stakes data analysis. We release the benchmark, code, and the LLM leaderboard at https://claimdb.github.io .

View on arXiv PDF

Similar