Non-Convex Optimization with Spectral Radius Regularization
This work addresses generalization issues in deep learning for applications like healthcare, though it appears incremental as it builds on existing flat minima concepts.
The paper tackles the problem of finding flat minima in deep neural network training to improve generalization, proposing a spectral radius regularization method that outperforms baseline models on real-world test data with different distributions.
We develop regularization methods to find flat minima while training deep neural networks. These minima generalize better than sharp minima, yielding models outperforming baselines on real-world test data (which may be distributed differently than the training data). Specifically, we propose a method of regularized optimization to reduce the spectral radius of the Hessian of the loss function. We also derive algorithms to efficiently optimize neural network models and prove that these algorithms almost surely converge. Furthermore, we demonstrate that our algorithm works effectively on applications in different domains, including healthcare. To show that our models generalize well, we introduced various methods for testing generalizability and found that our models outperform comparable baseline models on these tests.