CLJul 26, 2025

KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models

arXiv:2507.19962v13 citationsEMNLP

Originality Incremental advance

AI Analysis

This addresses fairness and harm issues in generative AI, offering a principled solution for reducing bias, though it appears incremental as it refines existing attention mechanisms.

The paper tackled societal bias in large language models by proposing KLAAD, an attention-based debiasing framework that improved bias mitigation on BBQ and BOLD benchmarks with minimal impact on language quality.

Large language models (LLMs) often exhibit societal biases in their outputs, prompting ethical concerns regarding fairness and harm. In this work, we propose KLAAD (KL-Attention Alignment Debiasing), an attention-based debiasing framework that implicitly aligns attention distributions between stereotypical and anti-stereotypical sentence pairs without directly modifying model weights. KLAAD introduces a composite training objective combining Cross-Entropy, KL divergence, and Triplet losses, guiding the model to consistently attend across biased and unbiased contexts while preserving fluency and coherence. Experimental evaluation of KLAAD demonstrates improved bias mitigation on both the BBQ and BOLD benchmarks, with minimal impact on language modeling quality. The results indicate that attention-level alignment offers a principled solution for mitigating bias in generative language models.

View on arXiv PDF

Similar