DiVA: Fine-grained Factuality Verification with Agentic-Discriminative Verifier
This work addresses the need for more nuanced factuality assessment in LLMs, which is crucial for applications like fine-grained evaluation and preference optimization, representing an incremental advancement in verification methods.
The paper tackles the problem of coarse binary factuality verification in LLMs by proposing DiVA, a hybrid framework that combines generative and discriminative models for fine-grained verification, achieving significant performance improvements on a new benchmark.
Despite the significant advancements of Large Language Models (LLMs), their factuality remains a critical challenge, fueling growing interest in factuality verification. Existing research on factuality verification primarily conducts binary judgments (e.g., correct or incorrect), which fails to distinguish varying degrees of error severity. This limits its utility for applications such as fine-grained evaluation and preference optimization. To bridge this gap, we propose the Agentic Discriminative Verifier (DiVA), a hybrid framework that synergizes the agentic search capabilities of generative models with the precise scoring aptitude of discriminative models. We also construct a new benchmark, FGVeriBench, as a robust testbed for fine-grained factuality verification. Experimental results on FGVeriBench demonstrate that our DiVA significantly outperforms existing methods on factuality verification for both general and multi-hop questions.