CLOct 7, 2025

LLM Bias Detection and Mitigation through the Lens of Desired Distributions

arXiv:2510.06354v16 citationsh-index: 2EMNLP
Originality Incremental advance
AI Analysis

This work addresses bias mitigation in LLMs for applications requiring alignment with specific distributions, though it is incremental as it builds on existing fine-tuning approaches.

The paper tackled bias in LLMs by defining it as deviation from desired distributions, such as equality or real-world data, and proposed a weighted adaptive loss fine-tuning method to align gender-profession outputs, achieving near-complete mitigation under equality and 30-75% reduction under real-world settings.

Although prior work on bias mitigation has focused on promoting social equality and demographic parity, less attention has been given to aligning LLM's outputs to desired distributions. For example, we might want to align a model with real-world distributions to support factual grounding. Thus, we define bias as deviation from a desired distribution, which may be an equal or real-world distribution, depending on application goals. We propose a weighted adaptive loss based fine-tuning method that aligns LLM's gender-profession output distribution with the desired distribution, while preserving language modeling capability. Using 3 profession sets -- male-dominated, female-dominated, and gender-balanced -- derived from U.S. labor statistics (2024), we assess both our adaptive method for reflecting reality and a non-adaptive variant for equality. Across three masked language models, bias is observed under both distributions. We achieve near-complete mitigation under equality and 30-75% reduction under real-world settings. Autoregressive LLMs show no bias under equality but notable bias under real-world settings, with the Llama Instruct models (3.2-3B, 3.1-8B) achieving a 50-62% reduction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes