Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale
This addresses the challenge of misinformation spread for researchers and fact-checkers by providing a scalable tool to monitor and respond to spurious news, though it is incremental as it builds on existing clustering and NLP methods.
The paper tackles the problem of tracking misinformation narratives across unreliable news websites by introducing an automated system that identifies and tracks narratives using daily scrapes of 1,334 sites, MPNet, and DP-Means clustering, resulting in the identification of 52,036 narratives and analysis of prevalent ones and influential websites in 2022.
Misinformation, propaganda, and outright lies proliferate on the web, with some narratives having dangerous real-world consequences on public health, elections, and individual safety. However, despite the impact of misinformation, the research community largely lacks automated and programmatic approaches for tracking news narratives across online platforms. In this work, utilizing daily scrapes of 1,334 unreliable news websites, the large-language model MPNet, and DP-Means clustering, we introduce a system to automatically identify and track the narratives spread within online ecosystems. Identifying 52,036 narratives on these 1,334 websites, we describe the most prevalent narratives spread in 2022 and identify the most influential websites that originate and amplify narratives. Finally, we show how our system can be utilized to detect new narratives originating from unreliable news websites and to aid fact-checkers in more quickly addressing misinformation. We release code and data at https://github.com/hanshanley/specious-sites.