CLJul 15, 2025

EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering

Valle Ruiz-Fernández, Mario Mina, Júlia Falcão, Luis Vasquez-Reina, Anna Sallés, Aitor Gonzalez-Agirre, Olatz Perez-de-Viñaspre

arXiv:2507.11216v14.91 citationsh-index: 21

Originality Synthesis-oriented

AI Analysis

This addresses the problem of evaluating social biases in LLMs for Spanish and Catalan languages and Spain's social context, providing new benchmarks but is incremental as it adapts an existing method to new data.

The paper tackles the lack of social bias evaluation resources for non-English languages and non-U.S. contexts by introducing EsBBQ and CaBBQ, Spanish and Catalan benchmarks for question answering, showing that models often fail in ambiguous scenarios and that high accuracy correlates with greater reliance on social biases.

Previous literature has largely shown that Large Language Models (LLMs) perpetuate social biases learnt from their pre-training data. Given the notable lack of resources for social bias evaluation in languages other than English, and for social contexts outside of the United States, this paper introduces the Spanish and the Catalan Bias Benchmarks for Question Answering (EsBBQ and CaBBQ). Based on the original BBQ, these two parallel datasets are designed to assess social bias across 10 categories using a multiple-choice QA setting, now adapted to the Spanish and Catalan languages and to the social context of Spain. We report evaluation results on different LLMs, factoring in model family, size and variant. Our results show that models tend to fail to choose the correct answer in ambiguous scenarios, and that high QA accuracy often correlates with greater reliance on social biases.

View on arXiv PDF

Similar