CLAICYLGAug 13, 2025

PakBBQ: A Culturally Adapted Bias Benchmark for QA

arXiv:2508.10186v23 citationsh-index: 16EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses bias in LLMs for users in Pakistan, but it is incremental as it extends an existing benchmark to a new cultural context.

The authors tackled the problem of bias in Large Language Models (LLMs) for low-resource languages and regional contexts by introducing PakBBQ, a culturally adapted bias benchmark for question answering in Pakistan, and found that disambiguation improved accuracy by 12% on average, with stronger counter-bias behaviors in Urdu than English and framing effects reducing stereotypical responses.

With the widespread adoption of Large Language Models (LLMs) across various applications, it is empirical to ensure their fairness across all user communities. However, most LLMs are trained and evaluated on Western centric data, with little attention paid to low-resource languages and regional contexts. To address this gap, we introduce PakBBQ, a culturally and regionally adapted extension of the original Bias Benchmark for Question Answering (BBQ) dataset. PakBBQ comprises over 214 templates, 17180 QA pairs across 8 categories in both English and Urdu, covering eight bias dimensions including age, disability, appearance, gender, socio-economic status, religious, regional affiliation, and language formality that are relevant in Pakistan. We evaluate multiple multilingual LLMs under both ambiguous and explicitly disambiguated contexts, as well as negative versus non negative question framings. Our experiments reveal (i) an average accuracy gain of 12\% with disambiguation, (ii) consistently stronger counter bias behaviors in Urdu than in English, and (iii) marked framing effects that reduce stereotypical responses when questions are posed negatively. These findings highlight the importance of contextualized benchmarks and simple prompt engineering strategies for bias mitigation in low resource settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes