CLSep 28, 2025

BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models

Zsolt T. Kardkovacs, Lynda Djennane, Anna Field, Boualem Benatallah, Yacine Gaci, Fabio Casati, Walid Gaaloul

arXiv:2509.24101v21 citationsh-index: 62EMNLP

Originality Incremental advance

AI Analysis

This addresses the challenge of efficiently testing for biases in sentiment analysis models, which is important for developers and users to mitigate harmful real-world impacts, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the problem of identifying social biases in sentiment analysis models by introducing BTC-SAM, a framework that uses large language models to generate high-quality, diverse test cases, resulting in better test coverage compared to base prompting methods.

Sentiment Analysis (SA) models harbor inherent social biases that can be harmful in real-world applications. These biases are identified by examining the output of SA models for sentences that only vary in the identity groups of the subjects. Constructing natural, linguistically rich, relevant, and diverse sets of sentences that provide sufficient coverage over the domain is expensive, especially when addressing a wide range of biases: it requires domain experts and/or crowd-sourcing. In this paper, we present a novel bias testing framework, BTC-SAM, which generates high-quality test cases for bias testing in SA models with minimal specification using Large Language Models (LLMs) for the controllable generation of test sentences. Our experiments show that relying on LLMs can provide high linguistic variation and diversity in the test sentences, thereby offering better test coverage compared to base prompting methods even for previously unseen biases.

View on arXiv PDF

Similar