Cultural Compass: A Framework for Organizing Societal Norms to Detect Violations in Human-AI Conversations
This addresses the need for nuanced and context-sensitive evaluation of cultural norm adherence in human-AI conversations, which is crucial for making generative AI models safer and more useful across diverse societies.
The paper tackled the problem of evaluating AI models' adherence to sociocultural norms in cross-cultural contexts by introducing a taxonomy that clarifies norm contexts, specifications, and mechanisms, and demonstrated that state-of-the-art models frequently violate norms with rates varying by model, context, and country.
Generative AI models ought to be useful and safe across cross-cultural contexts. One critical step toward this goal is understanding how AI models adhere to sociocultural norms. While this challenge has gained attention in NLP, existing work lacks both nuance and coverage in understanding and evaluating models' norm adherence. We address these gaps by introducing a taxonomy of norms that clarifies their contexts (e.g., distinguishing between human-human norms that models should recognize and human-AI interactional norms that apply to the human-AI interaction itself), specifications (e.g., relevant domains), and mechanisms (e.g., modes of enforcement). We demonstrate how our taxonomy can be operationalized to automatically evaluate models' norm adherence in naturalistic, open-ended settings. Our exploratory analyses suggest that state-of-the-art models frequently violate norms, though violation rates vary by model, interactional context, and country. We further show that violation rates also vary by prompt intent and situational framing. Our taxonomy and demonstrative evaluation pipeline enable nuanced, context-sensitive evaluation of cultural norm adherence in realistic settings.