CaseFacts: A Benchmark for Legal Fact-Checking and Precedent Retrieval
This addresses the problem of legal fact-checking for researchers and practitioners in AI and law, but it is incremental as it builds on existing fact-checking benchmarks by focusing on a specific domain.
The authors tackled the problem of automated fact-checking in the legal domain by introducing CaseFacts, a benchmark for verifying colloquial legal claims against U.S. Supreme Court precedents, which includes 6,294 claims categorized as Supported, Refuted, or Overruled, and experiments showed that state-of-the-art LLMs struggle with this task, with web search augmentation degrading performance compared to closed-book baselines.
Automated Fact-Checking has largely focused on verifying general knowledge against static corpora, overlooking high-stakes domains like law where truth is evolving and technically complex. We introduce CaseFacts, a benchmark for verifying colloquial legal claims against U.S. Supreme Court precedents. Unlike existing resources that map formal texts to formal texts, CaseFacts challenges systems to bridge the semantic gap between layperson assertions and technical jurisprudence while accounting for temporal validity. The dataset consists of 6,294 claims categorized as Supported, Refuted, or Overruled. We construct this benchmark using a multi-stage pipeline that leverages Large Language Models (LLMs) to synthesize claims from expert case summaries, employing a novel semantic similarity heuristic to efficiently identify and verify complex legal overrulings. Experiments with state-of-the-art LLMs reveal that the task remains challenging; notably, augmenting models with unrestricted web search degrades performance compared to closed-book baselines due to the retrieval of noisy, non-authoritative precedents. We release CaseFacts to spur research into legal fact verification systems.