CLMay 14, 2025

Atomic Consistency Preference Optimization for Long-Form Question Answering

arXiv:2505.09039v23 citationsh-index: 5IJCNLP-AACL
Originality Highly original
AI Analysis

This addresses the problem of factual unreliability in LLMs for long-form question answering, offering a novel self-supervised approach that eliminates dependency on external models or knowledge bases.

The paper tackles the problem of factoid hallucinations in Large Language Models by proposing Atomic Consistency Preference Optimization (ACPO), a self-supervised method that improves factual accuracy without external supervision, achieving a 1.95-point improvement over supervised baselines on benchmark datasets.

Large Language Models (LLMs) often produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by training on curated (factual, non-factual) pairs. However, this approach often relies on a stronger model (e.g., GPT-4) or an external knowledge base to assess factual correctness that may not always be accessible. Addressing this, we propose Atomic Consistency Preference Optimization (ACPO), a self-supervised preference-tuning method that enhances factual accuracy without external supervision. ACPO leverages atomic consistency signals (i.e., the agreement of individual facts across multiple stochastic responses) to identify high- and low-quality data pairs for model alignment. Despite being fully self-supervised, ACPO outperforms the strong supervised alignment baseline by 1.95 points averaged across Phi-3 and Llama3 on the LongFact and BioGen datasets, demonstrating its effectiveness in improving factual reliability without relying on external models or knowledge bases.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes