Out of spuriousity: Improving robustness to spurious correlations without group annotations
This addresses robustness issues for machine learning models in real-world applications where spurious correlations are common, offering an incremental improvement over existing methods by not requiring group labels.
The paper tackles the problem of machine learning models learning spurious correlations, which harms performance in data groups without these correlations and reduces generalization. The result is an approach that extracts a subnetwork from a trained network to improve worst-group performance, demonstrating robustness without group annotations.
Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network that is responsible for using only invariant features in classification tasks, therefore erasing the influence of spurious features even in the setup of multi spurious attributes and no prior knowledge of attributes labels.