CYAICLCRApr 14, 2025

RealHarm: A Collection of Real-World Language Model Application Failures

arXiv:2504.10277v17 citationsh-index: 5Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for empirical evidence on AI application failures for deployers and regulators, though it is incremental in building on existing harm frameworks.

The paper tackles the problem of real-world failures in language model applications by introducing RealHarm, a dataset of annotated problematic interactions, and finds that reputational damage and misinformation are key issues, with current guardrails showing significant gaps in protection.

Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes