CHECKWHY: Causal Fact Verification via Argument Structure
This addresses the need for better reasoning in fact verification for AI systems, but it is incremental as it focuses on a specific dataset and task.
The authors tackled the lack of explicit logical reasoning in fact verification by introducing CheckWhy, a dataset of over 19K triplets for causal fact verification, and found that incorporating argument structures improves performance, though models struggle to generate satisfying structures.
With the growing complexity of fact verification tasks, the concern with "thoughtful" reasoning capabilities is increasing. However, recent fact verification benchmarks mainly focus on checking a narrow scope of semantic factoids within claims and lack an explicit logical reasoning process. In this paper, we introduce CheckWhy, a challenging dataset tailored to a novel causal fact verification task: checking the truthfulness of the causal relation within claims through rigorous reasoning steps. CheckWhy consists of over 19K "why" claim-evidence-argument structure triplets with supports, refutes, and not enough info labels. Each argument structure is composed of connected evidence, representing the reasoning process that begins with foundational evidence and progresses toward claim establishment. Through extensive experiments on state-of-the-art models, we validate the importance of incorporating the argument structure for causal fact verification. Moreover, the automated and human evaluation of argument structure generation reveals the difficulty in producing satisfying argument structure by fine-tuned models or Chain-of-Thought prompted LLMs, leaving considerable room for future improvements.