Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?
This work addresses the problem of spurious correlations in machine learning models for medical applications, but it is incremental as it builds on existing DFR research.
The paper examines the Deep Feature Reweighting (DFR) method for improving model robustness to spurious correlations, finding that while it can enhance worst-group accuracy, it remains susceptible to these correlations in medical domain data.
Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations.