CLAIDec 23, 2025

FaithLens: Detecting and Explaining Faithfulness Hallucination

arXiv:2512.20182v12 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses the need for trustworthy LLM applications, such as retrieval-augmented generation and summarization, by offering an incremental improvement in detection and explanation capabilities.

The paper tackles the problem of detecting faithfulness hallucination in large language model outputs by introducing FaithLens, a cost-efficient model that provides binary predictions and explanations, achieving superior performance over advanced models like GPT-4.1 and o3 across 12 diverse tasks.

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes