TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation
This addresses the need for more comprehensive and transparent fact-checking benchmarks, particularly for high-influence events, though it is incremental in building on existing datasets and methods.
The authors introduced TrendFact, a benchmark for evaluating hotspot perception and all fact-checking tasks, consisting of 7,643 samples and 366,634 evidence entries, and found that current systems face significant limitations on it. They also proposed FactISR, a reasoning framework that improves large language models' performance in explainable fact-checking.
Fact-checking benchmarks provide standardized testing criteria for automated fact-checking systems, driving technological advancement. With the surge of misinformation on social media and the emergence of various fact-checking methods, public concern about the transparency of automated systems and the accuracy of fact-checking for high infulence events has grown. However, existing benchmarks fail to meet these urgent needs and are predominantly English-centric, hindering the progress of comprehensive fact-checking. To address these issues, we introduce TrendFact, the first benchmark capable of evaluating hotspot perception ability (HPA) and all fact-checking tasks. TrendFact consists of 7,643 curated samples sourced from trending platforms and professional fact-checking datasets, as well as an evidence library containing 366,634 entries with publication dates. Additionally, to complement existing benchmarks in evaluating system explanation consistency and HPA, we propose two new metrics: ECS and HCPI. Experimental results show that current fact-checking systems face significant limitations when evaluated on TrendFact, which facilitates the development of more robust fact-checking methods. Furthermore, to enhance the capabilities of existing advanced fact-checking systems, the reasoning large language models (RLMs), we propose FactISR, a reasoning framework that integrates dynamic evidence augmentation with influence score-based iterative self-reflection. FactISR effectively improves RLM's performance, offering new insights into explainable and complex fact-checking.