CLJun 14, 2025

Exploring Cultural Variations in Moral Judgments with Large Language Models

arXiv:2506.12433v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of cultural bias in LLMs for researchers and developers, though it is incremental as it builds on existing methods for bias analysis.

The paper examined whether large language models (LLMs) can capture culturally diverse moral values by comparing their outputs to cross-cultural survey data, finding that advanced instruction-tuned models like GPT-4o achieved substantially higher correlations with human judgments than earlier or smaller models.

Large Language Models (LLMs) have shown strong performance across many tasks, but their ability to capture culturally diverse moral values remains unclear. In this paper, we examine whether LLMs can mirror variations in moral attitudes reported by two major cross-cultural surveys: the World Values Survey and the PEW Research Center's Global Attitudes Survey. We compare smaller, monolingual, and multilingual models (GPT-2, OPT, BLOOMZ, and Qwen) with more recent instruction-tuned models (GPT-4o, GPT-4o-mini, Gemma-2-9b-it, and Llama-3.3-70B-Instruct). Using log-probability-based moral justifiability scores, we correlate each model's outputs with survey data covering a broad set of ethical topics. Our results show that many earlier or smaller models often produce near-zero or negative correlations with human judgments. In contrast, advanced instruction-tuned models (including GPT-4o and GPT-4o-mini) achieve substantially higher positive correlations, suggesting they better reflect real-world moral attitudes. While scaling up model size and using instruction tuning can improve alignment with cross-cultural moral norms, challenges remain for certain topics and regions. We discuss these findings in relation to bias analysis, training data diversity, and strategies for improving the cultural sensitivity of LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes