CLOct 21, 2024

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S

Christabel Acquaye, Haozhe An, Rachel Rudinger

arXiv:2410.16451v26.112 citationsh-index: 25EMNLP

Originality Incremental advance

AI Analysis

This addresses the need for culturally adaptable models and evaluations to serve diverse English-speaking populations, though it is incremental as it builds on existing work about cultural commonsense.

The paper tackles the problem of culturally-contingent commonsense knowledge by introducing AMAMMERε, a test set of 525 multiple-choice questions to evaluate English LLMs in Ghanaian and U.S. contexts, finding that models uniformly prefer U.S. answers and perform better in U.S. contexts than Ghanaian ones.

Recent work has highlighted the culturally-contingent nature of commonsense knowledge. We introduce AMAMMER$ε$, a test set of 525 multiple-choice questions designed to evaluate the commonsense knowledge of English LLMs, relative to the cultural contexts of Ghana and the United States. To create AMAMMER$ε$, we select a set of multiple-choice questions (MCQs) from existing commonsense datasets and rewrite them in a multi-stage process involving surveys of Ghanaian and U.S. participants. In three rounds of surveys, participants from both pools are solicited to (1) write correct and incorrect answer choices, (2) rate individual answer choices on a 5-point Likert scale, and (3) select the best answer choice from the newly-constructed MCQ items, in a final validation step. By engaging participants at multiple stages, our procedure ensures that participant perspectives are incorporated both in the creation and validation of test items, resulting in high levels of agreement within each pool. We evaluate several off-the-shelf English LLMs on AMAMMER$ε$. Uniformly, models prefer answers choices that align with the preferences of U.S. annotators over Ghanaian annotators. Additionally, when test items specify a cultural context (Ghana or the U.S.), models exhibit some ability to adapt, but performance is consistently better in U.S. contexts than Ghanaian. As large resources are devoted to the advancement of English LLMs, our findings underscore the need for culturally adaptable models and evaluations to meet the needs of diverse English-speaking populations around the world.

View on arXiv PDF

Similar