Sharpness-Aware Minimization Can Hallucinate Minimizers
This work identifies a critical flaw in a widely used optimization method for machine learning, potentially affecting generalization in training neural networks.
The paper demonstrates that Sharpness-Aware Minimization (SAM) can converge to hallucinated minimizers, which are not actual minimizers of the original objective, and provides a theoretical proof and empirical evidence for this phenomenon, along with a proposed remedy.
Sharpness-Aware Minimization (SAM) is a widely used method that steers training toward flatter minimizers, which typically generalize better. In this work, however, we show that SAM can converge to hallucinated minimizers -- points that are not minimizers of the original objective. We theoretically prove the existence of such hallucinated minimizers and establish conditions for local convergence to them. We further provide empirical evidence demonstrating that SAM can indeed converge to these points in practice. Finally, we propose a simple yet effective remedy for avoiding hallucinated minimizers.