CLSep 17, 2024

HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection

arXiv:2409.11579v310 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of robust and explainable stereotype detection for AI systems, though it appears incremental as it builds on existing methods like BERT and SHAP.

The authors tackled the problem of inaccurate stereotype detection in LLMs by introducing the HEARTS framework, which improved model performance and provided transparent explanations, as demonstrated by BERT models fine-tuned on their new EMGSD dataset outperforming baseline components.

Stereotypes are generalised assumptions about societal groups, and even state-of-the-art LLMs using in-context learning struggle to identify them accurately. Due to the subjective nature of stereotypes, where what constitutes a stereotype can vary widely depending on cultural, social, and individual perspectives, robust explainability is crucial. Explainable models ensure that these nuanced judgments can be understood and validated by human users, promoting trust and accountability. We address these challenges by introducing HEARTS (Holistic Framework for Explainable, Sustainable, and Robust Text Stereotype Detection), a framework that enhances model performance, minimises carbon footprint, and provides transparent, interpretable explanations. We establish the Expanded Multi-Grain Stereotype Dataset (EMGSD), comprising 57,201 labelled texts across six groups, including under-represented demographics like LGBTQ+ and regional stereotypes. Ablation studies confirm that BERT models fine-tuned on EMGSD outperform those trained on individual components. We then analyse a fine-tuned, carbon-efficient ALBERT-V2 model using SHAP to generate token-level importance values, ensuring alignment with human understanding, and calculate explainability confidence scores by comparing SHAP and LIME outputs...

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes