Who, Why, and How: Disentangling the Effects of Moderation Source, Context, and Language on Post-Removal Behavior
For platform designers and moderation researchers, this work provides empirical evidence that moderation source, context, and language interact to shape user behavior, challenging assumptions about human moderation advantages and highlighting the need for context-adaptive strategies.
This study analyzes 11.8 million Reddit moderation events to show that bot moderation yields higher compliance and lower self-censorship than human or modteam moderation, and that linguistic strategies effective for minor violations can backfire for serious ones, with 33 of 480 linguistic interactions significant after FDR correction.
Content moderation is a central mechanism through which platforms attempt to balance user engagement with community governance. Yet existing research has largely treated moderation as a uniform intervention, overlooking how moderator source, violation context, and linguistic style jointly shape user behavior. Drawing on the Human--AI Interaction Theory of Interactive Media Effects (HAII-TIME), this study examines how these three dimensions produce divergent post-moderation behavioral trajectories in a large-scale observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021--2025). Using probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, we find that bot moderation consistently produces higher compliance and lower self-censorship than human or modteam moderation, challenging the assumption that human agency cues are inherently advantageous. Modteam moderation produces the strongest self-censorship effects, suggesting that institutional depersonalization is a meaningful driver of behavioral withdrawal. Violation severity emerges as a critical contingency: linguistic strategies effective in routine contexts -- elaborated explanation, community-scale appeals, direct personal address -- can backfire for serious violations, whereas prosocially framed and emotionally emphatic messages become most effective when stakes are highest. Of 480 linguistic interactions tested, 33 survive FDR correction. These findings extend HAII-TIME by introducing violation salience as a moderator of cue-based processing, and offer empirical grounding for context-adaptive moderation design.