CLAICYLGApr 21, 2025

Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions

Stanford
arXiv:2504.15236v148 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the need for grounded evaluation of AI values in deployment, providing a foundation for more informed design of value-aligned systems.

The researchers tackled the problem of empirically identifying the values expressed by AI assistants in real-world interactions, discovering and categorizing 3,307 values from Claude models and analyzing their context-dependent variation.

AI assistants can impart value judgments that shape people's decisions and worldviews, yet little is known empirically about what values these systems rely on in practice. To address this, we develop a bottom-up, privacy-preserving method to extract the values (normative considerations stated or demonstrated in model responses) that Claude 3 and 3.5 models exhibit in hundreds of thousands of real-world interactions. We empirically discover and taxonomize 3,307 AI values and study how they vary by context. We find that Claude expresses many practical and epistemic values, and typically supports prosocial human values while resisting values like "moral nihilism". While some values appear consistently across contexts (e.g. "transparency"), many are more specialized and context-dependent, reflecting the diversity of human interlocutors and their varied contexts. For example, "harm prevention" emerges when Claude resists users, "historical accuracy" when responding to queries about controversial events, "healthy boundaries" when asked for relationship advice, and "human agency" in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, our work creates a foundation for more grounded evaluation and design of values in AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes