CLAIJan 28, 2025

FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data

arXiv:2501.17144v115 citationsh-index: 7NAACL
Originality Highly original
AI Analysis

This work addresses the challenge of document-level factuality classification for LLM outputs, offering a more efficient and effective solution for researchers and practitioners in AI safety.

The paper tackles the problem of detecting hallucinations in large language models by developing FactCG, a fact checker that uses graph-based multi-hop reasoning for synthetic data generation, which outperforms GPT-4-o on the LLM-Aggrefact benchmark with a smaller model size.

Prior research on training grounded factuality classification models to detect hallucinations in large language models (LLMs) has relied on public natural language inference (NLI) data and synthetic data. However, conventional NLI datasets are not well-suited for document-level reasoning, which is critical for detecting LLM hallucinations. Recent approaches to document-level synthetic data generation involve iteratively removing sentences from documents and annotating factuality using LLM-based prompts. While effective, this method is computationally expensive for long documents and limited by the LLM's capabilities. In this work, we analyze the differences between existing synthetic training data used in state-of-the-art models and real LLM output claims. Based on our findings, we propose a novel approach for synthetic data generation, CG2C, that leverages multi-hop reasoning on context graphs extracted from documents. Our fact checker model, FactCG, demonstrates improved performance with more connected reasoning, using the same backbone models. Experiments show it even outperforms GPT-4-o on the LLM-Aggrefact benchmark with much smaller model size.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes