CLAICRLGSep 5, 2024

LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts

arXiv:2409.03291v29 citationsh-index: 5Has Code
AI Analysis

This work addresses the challenge of detecting AI-generated disinformation for information security, highlighting that current solutions are incremental and not ready for practical use.

The paper tackles the problem of LLM-generated disinformation in short news-like posts, demonstrating that existing LLM detectors, including zero-shot and purpose-trained ones, are ineffective in real-world settings, with zero-shot detectors showing inconsistencies and vulnerabilities to trivial attacks like temperature increases, and purpose-trained detectors failing to generalize to new human-written texts.

With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated attackers. We demonstrate that existing LLM detectors, whether zero-shot or purpose-trained, are not ready for real-world use in that setting. All tested zero-shot detectors perform inconsistently with prior benchmarks and are highly vulnerable to sampling temperature increase, a trivial attack absent from recent benchmarks. A purpose-trained detector generalizing across LLMs and unseen attacks can be developed, but it fails to generalize to new human-written texts. We argue that the former indicates domain-specific benchmarking is needed, while the latter suggests a trade-off between the adversarial evasion resilience and overfitting to the reference human text, with both needing evaluation in benchmarks and currently absent. We believe this suggests a re-consideration of current LLM detector benchmarking approaches and provides a dynamically extensible benchmark to allow it (https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes