CVJan 24, 2025

Single-weight Model Editing for Post-hoc Spurious Correlation Neutralization

arXiv:2501.14182v21 citationsh-index: 32
Originality Incremental advance
AI Analysis

This addresses the issue of spurious correlations in deployed models for reliable prediction, though it is incremental as it builds on existing class removal schemes.

The paper tackles the problem of neural networks exploiting spurious correlations in features, which leads to incorrect predictions, by proposing a post-hoc method that neutralizes spurious feature impact with a single-weight modification, achieving competitive or better performance compared to state-of-the-art methods.

Neural network training tends to exploit the simplest features as shortcuts to greedily minimize training loss. However, some of these features might be spuriously correlated with the target labels, leading to incorrect predictions by the model. Several methods have been proposed to address this issue. Focusing on suppressing the spurious correlations with model training, they not only incur additional training cost, but also have limited practical utility as the model misbehavior due to spurious relations is usually discovered after its deployment. It is also often overlooked that spuriousness is a subjective notion. Hence, the precise questions that must be investigated are; to what degree a feature is spurious, and how we can proportionally distract the model's attention from it for reliable prediction. To this end, we propose a method that enables post-hoc neutralization of spurious feature impact, controllable to an arbitrary degree. We conceptualize spurious features as fictitious sub-classes within the original classes, which can be eliminated by a class removal scheme. We then propose a unique precise class removal technique that makes a single-weight modification, which entails negligible performance compromise for the remaining classes. We perform extensive experiments, demonstrating that by editing just a single weight in a post-hoc manner, our method achieves highly competitive, or better performance against the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes