MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
This work addresses the threat of fake news in digital environments for researchers and policymakers, though it is incremental as it builds on existing datasets and methods.
The authors tackled the problem of LLM-generated fake news by developing a theoretical framework and automated pipeline to create the MegaFake dataset, resulting in a comprehensive resource derived from GossipCop for analysis and future research.
The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs.