NeuronTune: Towards Self-Guided Spurious Bias Mitigation
This addresses the issue of degraded model robustness due to spurious correlations for practitioners, offering a practical, self-guided solution without reliance on hard-to-obtain annotations, though it is incremental as it builds on existing post hoc intervention methods.
The paper tackles the problem of spurious bias in deep neural networks, where models rely on non-essential features like backgrounds for predictions, by proposing NeuronTune, a post hoc method that identifies and regulates neurons causing spurious behaviors without needing external annotations, resulting in significant mitigation across architectures and data modalities.
Deep neural networks often develop spurious bias, reliance on correlations between non-essential features and classes for predictions. For example, a model may identify objects based on frequently co-occurring backgrounds rather than intrinsic features, resulting in degraded performance on data lacking these correlations. Existing mitigation approaches typically depend on external annotations of spurious correlations, which may be difficult to obtain and are not relevant to the spurious bias in a model. In this paper, we take a step towards self-guided mitigation of spurious bias by proposing NeuronTune, a post hoc method that directly intervenes in a model's internal decision process. Our method probes in a model's latent embedding space to identify and regulate neurons that lead to spurious prediction behaviors. We theoretically justify our approach and show that it brings the model closer to an unbiased one. Unlike previous methods, NeuronTune operates without requiring spurious correlation annotations, making it a practical and effective tool for improving model robustness. Experiments across different architectures and data modalities demonstrate that our method significantly mitigates spurious bias in a self-guided way.