CLAICYLGFeb 11, 2025

Breaking Down Bias: On The Limits of Generalizable Pruning Strategies

arXiv:2502.07771v12 citationsh-index: 3FAccT
Originality Incremental advance
AI Analysis

This research addresses the problem of mitigating racial biases in AI for legal and deployment frameworks, but it is incremental as it builds on existing pruning methods.

The study investigated whether model pruning can effectively reduce racial biases in LLMs and found that while pruning reduces bias without major side effects, its effectiveness diminishes when applied across different contexts, indicating biases are partly context-specific.

We employ model pruning to examine how LLMs conceptualize racial biases, and whether a generalizable mitigation strategy for such biases appears feasible. Our analysis yields several novel insights. We find that pruning can be an effective method to reduce bias without significantly increasing anomalous model behavior. Neuron-based pruning strategies generally yield better results than approaches pruning entire attention heads. However, our results also show that the effectiveness of either approach quickly deteriorates as pruning strategies become more generalized. For instance, a model that is trained on removing racial biases in the context of financial decision-making poorly generalizes to biases in commercial transactions. Overall, our analysis suggests that racial biases are only partially represented as a general concept within language models. The other part of these biases is highly context-specific, suggesting that generalizable mitigation strategies may be of limited effectiveness. Our findings have important implications for legal frameworks surrounding AI. In particular, they suggest that an effective mitigation strategy should include the allocation of legal responsibility on those that deploy models in a specific use case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes