Does Claude's Constitution Have a Culture?

arXiv:2603.2812312.0h-index: 1

Predicted impact top 43% in CY · last 90 daysOriginality Incremental advance

AI Analysis

This highlights a risk that constitutional alignment may codify cultural biases in AI, affecting global users and requiring more representative processes.

The study investigated whether Constitutional AI (CAI) reflects cultural biases by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, finding that its value profile most closely resembles Northern European and Anglophone countries and extends beyond surveyed populations on most items, with user cultural context not altering substantive positions.

Constitutional AI (CAI) aligns language models with explicitly stated normative principles, offering a transparent alternative to implicit alignment through human feedback alone. However, because constitutions are authored by specific groups of people, the resulting models may reflect particular cultural perspectives. We investigate this question by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, selected for high cross-cultural variance across six value domains and administered as both direct survey questions and naturalistic advice-seeking scenarios. Comparing Claude's responses to country-level data from 90 nations, we find that Claude's value profile most closely resembles those of Northern European and Anglophone countries, but on a majority of items extends beyond the range of all surveyed populations. When users provide cultural context, Claude adjusts its rhetorical framing but not its substantive value positions, with effect sizes indistinguishable from zero across all twelve tested countries. An ablation removing the system prompt increases refusals but does not alter the values expressed when responses are given, and replication on a smaller model (Claude Haiku) confirms the same cultural profile across model sizes. These findings suggest that when a constitution is authored within the same cultural tradition that dominates the training data, constitutional alignment may codify existing cultural biases rather than correct them--producing a value floor that surface-level interventions cannot meaningfully shift. We discuss the compounding nature of this risk and the need for globally representative constitution-authoring processes.

View on arXiv PDF

Similar