CL AIMay 8

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

William Brach, Federico Torrielli, Stine Lyngsø Beltoft, Annemette Brok Pirchert, Peter Schneider-Kamp, Lukas Galke Poech

arXiv:2605.0746292.2

Predicted impact top 23% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

For AI safety researchers, this work provides an empirical assessment of risks from AI-generated content platforms, but the findings are largely negative (no unique danger) and incremental.

This paper introduces the Moltbook Files, a dataset of 232k posts and 2.2M comments from an AI agent platform, and analyzes its properties and risks. Fine-tuning on this data reduces model truthfulness from 0.366 to 0.187, but a similar reduction occurs with Reddit data, suggesting Moltbook is a harmless slopocalypse.

Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dataset of 232k posts and 2.2M comments covering the platform's first 12 days, processed through a pipeline to identify and remove Personally-Identifiable Information (PII). We analyze community structure, authorship, lexical properties, sentiment, topics, semantic geometry, and comment interaction. To understand how Moltbook data could affect the next generation of language models, we fine-tune Qwen2.5-14B-Instruct on Moltbook Files with three adaptation levels. Our PII pipeline reveals that agents post API keys, passwords, BIP39 seed phrases on Moltbook, a publicly indexed platform. The overall sentiment is mostly neutral and mildly positive (66.6% neutral, 19.5% positive) and shows a tendency for self-referential linking. We find that fine-tuning on Moltbook data reduces truthfulness from 0.366 to 0.187. However, a model fine-tuned on a size-matched Reddit dataset produces a comparable decrease. Moltbook thus seems to be more of a harmless slopocalypse. However, tail risks remain, including agent affordances, contamination of future crawls through self-links, and potential transfer of traits to the next generation of language models. More broadly, our findings highlight the importance of control baselines in emergent misalignment evaluations.

View on arXiv PDF

Similar