CLAIJul 27, 2024

Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models

arXiv:2407.19345v412 citationsh-index: 17
AI Analysis

This addresses fairness issues in text classification for scenarios where retraining models is impractical, though it is incremental as it builds on existing debiasing and selective classification methods.

The paper tackles the problem of enhancing fairness in text classification models by proposing selective debiasing, an inference-time method that identifies and corrects biased predictions using LEACE, which reduces the performance gap between post-processing and other debiasing techniques.

We propose selective debiasing -- an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE -- a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes