CL CY HCApr 4, 2025

StereoDetect: Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings

Kaustubh Shivshankar Shejole, Pushpak Bhattacharyya

arXiv:2504.03352v36.71 citationsh-index: 3Has CodeEMNLP

Originality Incremental advance

AI Analysis

This work addresses a critical gap in Responsible AI by improving detection of stereotypes and anti-stereotypes, which is incremental as it builds on existing research with new definitions and benchmarks.

The study tackled the problem of detecting stereotypes and anti-stereotypes in AI by proposing a clear five-tuple definition and a social psychology-based framework, resulting in the creation of the StereoDetect benchmark dataset, which revealed that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations.

Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, thereby leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti-stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed StereoDetect, a well curated, definition-aligned benchmark dataset designed for this task. We show that sub-10B language models and GPT-4o frequently misclassify anti-stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect's effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them. The dataset and code is available at https://github.com/KaustubhShejole/StereoDetect.

View on arXiv PDF Code

Similar