Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
This addresses the problem of improving model reliability and efficiency for machine learning practitioners, though it appears incremental as it builds on known effects of learning rates.
The paper tackles the challenge of jointly achieving robustness to spurious correlations and model compressibility in machine learning, finding that high learning rates facilitate both properties and produce desirable representation features, with evidence across diverse datasets and models.
Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we identify high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is related to addressing hidden/rare spurious correlations in the training dataset. Our investigation of the mechanisms underlying this phenomenon reveals the importance of confident mispredictions of bias-conflicting samples under large learning rates.