Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News

Alexander Loth, Martin Kappes, Marc-Oliver Pahl

arXiv:2604.0375547.12 citationsh-index: 2

Predicted impact top 44% in CY · last 90 daysOriginality Incremental advance

AI Analysis

For policymakers and platform designers, this shows that user-side detection is not a viable defense against LLM-generated disinformation, motivating system-level countermeasures.

Humans cannot reliably distinguish LLM-generated news from human-written news (p > .05), with accuracy degrading after ~30 evaluations due to cognitive fatigue. The finding holds across six LLMs, including small open-weight models.

Can humans tell whether a news article was written by a person or a large language model (LLM)? We investigate this question using JudgeGPT, a study platform that independently measures source attribution (human vs. machine) and authenticity judgment (legitimate vs. fake) on continuous scales. From 2,318 judgments collected from 1,054 participants across content generated by six LLMs, we report five findings: (1) participants cannot reliably distinguish machine-generated from human-written text (p > .05, Welch's t-test); (2) this inability holds across all tested models, including open-weight models with as few as 7B parameters; (3) self-reported domain expertise predicts judgment accuracy (r = .35, p < .001) whereas political orientation does not (r = -.10, n.s.); (4) clustering reveals distinct response strategies ("Skeptics" vs. "Believers"); and (5) accuracy degrades after approximately 30 sequential evaluations due to cognitive fatigue. The answer, in short, is no: humans cannot reliably tell. These results indicate that user-side detection is not a viable defense and motivate system-level countermeasures such as cryptographic content provenance.

View on arXiv PDF

Similar