CLFeb 16, 2024

Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification

arXiv:2402.10735v416 citationsh-index: 12ACL
Originality Incremental advance
AI Analysis

This work addresses the open problem of assessing LLMs' reasoning for evidence-based claim verification, providing a new benchmark but is incremental in evaluating existing models.

The authors tackled the problem of evaluating LLMs' reasoning capabilities beyond mathematics and coding by creating RECV, the first claim verification benchmark with real-world claims, and found that while LLMs handle deductive reasoning, they consistently fail at abductive reasoning, with rationale generation not always helping.

Although LLMs have shown great performance on Mathematics and Coding related reasoning tasks, the reasoning capabilities of LLMs regarding other forms of reasoning are still an open problem. Here, we examine the issue of reasoning from the perspective of claim verification. We propose a framework designed to break down any claim paired with evidence into atomic reasoning types that are necessary for verification. We use this framework to create RECV, the first claim verification benchmark, incorporating real-world claims, to assess the deductive and abductive reasoning capabilities of LLMs. The benchmark comprises of three datasets, covering reasoning problems of increasing complexity. We evaluate three state-of-the-art proprietary LLMs under multiple prompt settings. Our results show that while LLMs can address deductive reasoning problems, they consistently fail in cases of abductive reasoning. Moreover, we observe that enhancing LLMs with rationale generation is not always beneficial. Nonetheless, we find that generated rationales are semantically similar to those provided by humans, especially in deductive reasoning cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes