Signal Is Harder To Learn Than Bias: Debiasing with Focal Loss
This addresses the issue of biased decision-making in classifiers for practitioners, potentially improving model reliability and interpretability, though it appears incremental as it builds on existing debiasing methods.
The paper tackles the problem of neural networks learning spurious correlations (biases) instead of underlying decision rules, which can lead to poor generalization and unintended consequences. It proposes Signal is Harder (SiH), a method using a variational autoencoder and novel reweighting scheme, which matches or improves state-of-the-art debiasing performance and includes a visualization technique for bias sources.
Spurious correlations are everywhere. While humans often do not perceive them, neural networks are notorious for learning unwanted associations, also known as biases, instead of the underlying decision rule. As a result, practitioners are often unaware of the biased decision-making of their classifiers. Such a biased model based on spurious correlations might not generalize to unobserved data, leading to unintended, adverse consequences. We propose Signal is Harder (SiH), a variational-autoencoder-based method that simultaneously trains a biased and unbiased classifier using a novel, disentangling reweighting scheme inspired by the focal loss. Using the unbiased classifier, SiH matches or improves upon the performance of state-of-the-art debiasing methods. To improve the interpretability of our technique, we propose a perturbation scheme in the latent space for visualizing the bias that helps practitioners become aware of the sources of spurious correlations.