CLAIDec 14, 2024

Learning to Verify Summary Facts with Fine-Grained LLM Feedback

arXiv:2412.10689v122 citationsh-index: 8Has CodeCOLING
Originality Incremental advance
AI Analysis

This addresses the data scarcity problem for researchers and practitioners in automated fact verification, offering a more cost-effective alternative to human annotation, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the challenge of limited human-labeled data for training summary fact verifiers by introducing FineSumFact, a large-scale dataset with fine-grained LLM-generated feedback, and shows that fine-tuning a lightweight model on this dataset outperforms models trained on smaller human-annotated datasets in human-generated test evaluations.

Training automatic summary fact verifiers often faces the challenge of a lack of human-labeled data. In this paper, we explore alternative way of leveraging Large Language Model (LLM) generated feedback to address the inherent limitation of using human-labeled data. We introduce FineSumFact, a large-scale dataset containing fine-grained factual feedback on summaries. We employ 10 distinct LLMs for diverse summary generation and Llama-3-70B-Instruct for feedback. We utilize this dataset to fine-tune the lightweight open-source model Llama-3-8B-Instruct, optimizing resource efficiency while maintaining high performance. Our experimental results reveal that the model trained on extensive LLM-generated datasets surpasses that trained on smaller human-annotated datasets when evaluated using human-generated test sets. Fine-tuning fact verification models with LLM feedback can be more effective and cost-efficient than using human feedback. The dataset is available at https://github.com/DISL-Lab/FineSumFact.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes