Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild

Alexander Loth, Martin Kappes, Marc-Oliver Pahl

arXiv:2601.22871v23 citationsh-index: 14

Originality Incremental advance

AI Analysis

This research addresses the problem of ensuring trustworthy information ecosystems for web intelligence users, offering insights into cognitive interventions rather than demographic segmentation.

The study tackled the challenge of distinguishing synthetic from organic content by analyzing human susceptibility to foundation model hallucinations and disinformation, finding that fake news familiarity mediates detection performance and GPT-4 outputs bypass source monitoring mechanisms with a HumanMachineScore of 0.20.

As foundation models (FMs) approach human-level fluency, distinguishing synthetic from organic content has become a key challenge for Trustworthy Web Intelligence. This paper presents JudgeGPT and RogueGPT, a dual-axis framework that decouples "authenticity" from "attribution" to investigate the mechanisms of human susceptibility. Analyzing 918 evaluations across five FMs (including GPT-4 and Llama-2), we employ Structural Causal Models (SCMs) as a principal framework for formulating testable causal hypotheses about detection accuracy. Contrary to partisan narratives, we find that political orientation shows a negligible association with detection performance ($r=-0.10$). Instead, "fake news familiarity" emerges as a candidate mediator ($r=0.35$), suggesting that exposure may function as adversarial training for human discriminators. We identify a "fluency trap" where GPT-4 outputs (HumanMachineScore: 0.20) bypass Source Monitoring mechanisms, rendering them indistinguishable from human text. These findings suggest that "pre-bunking" interventions should target cognitive source monitoring rather than demographic segmentation to ensure trustworthy information ecosystems.

View on arXiv PDF

Similar