LG AI CY MLDec 7, 2020

Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

arXiv:2012.04104v120.271 citations

Originality Highly original

AI Analysis

This work is significant for machine learning practitioners and researchers concerned with fairness and robustness, as it reveals a counter-intuitive negative effect of a common debiasing strategy.

This paper demonstrates that removing spurious features can decrease accuracy in overparameterized models, even in balanced datasets, due to inductive biases. It also shows that this removal can disproportionately affect different groups and make models susceptible to other spurious features. However, robust self-training can remove spurious features without impacting overall accuracy.

The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) in noiseless overparameterized linear regression. In addition, we show that removal of spurious feature can decrease the accuracy even in balanced datasets -- each target co-occurs equally with each spurious feature; and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training can remove spurious features without affecting the overall accuracy. Experiments on the Toxic-Comment-Detectoin and CelebA datasets show that our results hold in non-linear models.

View on arXiv PDF

Similar