CLApr 18, 2024

NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models

AI2CMU
arXiv:2404.12464v1033 citationsh-index: 49NAACL
Originality Incremental advance
AI Analysis

This addresses the need for LLMs to adapt to diverse global cultures for safe deployment, though it is incremental as it focuses on evaluation rather than solving the adaptability issue.

The authors tackled the problem of assessing large language models' cultural adaptability by introducing NormAd, a framework to measure their ability to judge social acceptability across cultural norms, and found that LLMs struggle with accuracy, performing below 82% even in simple settings compared to humans over 95%.

To be effectively and safely deployed to global user populations, large language models (LLMs) may need to adapt outputs to user values and cultures, not just know about them. We introduce NormAd, an evaluation framework to assess LLMs' cultural adaptability, specifically measuring their ability to judge social acceptability across varying levels of cultural norm specificity, from abstract values to explicit social norms. As an instantiation of our framework, we create NormAd-Eti, a benchmark of 2.6k situational descriptions representing social-etiquette related cultural norms from 75 countries. Through comprehensive experiments on NormAd-Eti, we find that LLMs struggle to accurately judge social acceptability across these varying degrees of cultural contexts and show stronger adaptability to English-centric cultures over those from the Global South. Even in the simplest setting where the relevant social norms are provided, the best LLMs' performance (< 82\%) lags behind humans (> 95\%). In settings with abstract values and country information, model performance drops substantially (< 60\%), while human accuracy remains high (> 90\%). Furthermore, we find that models are better at recognizing socially acceptable versus unacceptable situations. Our findings showcase the current pitfalls in socio-cultural reasoning of LLMs which hinder their adaptability for global audiences.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes