Limitations of Post-Hoc Feature Alignment for Robustness
This work questions the practical utility of feature alignment and unsupervised domain adaptation for improving robustness in machine learning, highlighting incremental insights.
The paper investigates the limitations of post-hoc feature alignment using batch normalization statistics for robustness to distribution shift, finding it only helps with narrow shifts and can degrade performance in some settings.
Feature alignment is an approach to improving robustness to distribution shift that matches the distribution of feature activations between the training distribution and test distribution. A particularly simple but effective approach to feature alignment involves aligning the batch normalization statistics between the two distributions in a trained neural network. This technique has received renewed interest lately because of its impressive performance on robustness benchmarks. However, when and why this method works is not well understood. We investigate the approach in more detail and identify several limitations. We show that it only significantly helps with a narrow set of distribution shifts and we identify several settings in which it even degrades performance. We also explain why these limitations arise by pinpointing why this approach can be so effective in the first place. Our findings call into question the utility of this approach and Unsupervised Domain Adaptation more broadly for improving robustness in practice.