CRED-1: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation

Alexander Loth, Martin Kappes, Marc-Oliver Pahl

arXiv:2604.20856h-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

Provides a reproducible, privacy-preserving dataset for automated misinformation detection at the domain level, but is incremental as it combines existing lists and standard signals.

The authors created CRED-1, a dataset of 2,672 domains with credibility scores (0.0–1.0) by combining two source lists and four enrichment signals, designed for on-device pre-bunking of misinformation in browser extensions.

This article presents CRED-1, an open, reproducible domain-level credibility dataset combining two openly-licensed source lists (OpenSources.co and Iffy.news) with four computed enrichment signals: domain age (WHOIS/RDAP), web popularity (Tranco Top-1M), fact-check frequency (Google Fact Check Tools API), and threat intelligence (Google Safe Browsing API). The dataset covers 2,672 domains categorized as fake, unreliable, mixed, conspiracy, or satire, each assigned a composite credibility score between 0.0 and 1.0. CRED-1 is designed for on-device deployment in privacy-preserving browser extensions to enable client-side pre-bunking of misinformation at the content delivery stage. The entire pipeline is implemented in Python using only standard library modules and is fully reproducible from publicly available sources. The dataset and pipeline code are released under CC~BY~4.0 and archived on Zenodo.

View on arXiv PDF Code

Similar