CLAIOct 18, 2024

Real-time Factuality Assessment from Adversarial Feedback

arXiv:2410.14651v36 citationsh-index: 14Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the challenge of assessing real-time factuality for LLMs, highlighting vulnerabilities in current evaluation methods, though it is incremental in improving adversarial testing.

The authors tackled the problem of evaluating factuality in news by showing that existing datasets are insufficient, and they developed a pipeline using adversarial feedback to create deceptive news variants that reduced a strong RAG-based detector's ROC-AUC by 17.5%.

We show that existing evaluations for assessing the factuality of news from conventional sources, such as claims on fact-checking websites, result in high accuracies over time for LLM-based detectors-even after their knowledge cutoffs. This suggests that recent popular false information from such sources can be easily identified due to its likely presence in pre-training/retrieval corpora or the emergence of salient, yet shallow, patterns in these datasets. Instead, we argue that a proper factuality evaluation dataset should test a model's ability to reason about current events by retrieving and reading related evidence. To this end, we develop a novel pipeline that leverages natural language feedback from a RAG-based detector to iteratively modify real-time news into deceptive variants that challenge LLMs. Our iterative rewrite decreases the binary classification ROC-AUC by an absolute 17.5 percent for a strong RAG-based GPT-4o detector. Our experiments reveal the important role of RAG in both evaluating and generating challenging news examples, as retrieval-free LLM detectors are vulnerable to unseen events and adversarial attacks, while feedback from RAG-based evaluation helps discover more deceitful patterns.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes