Strong convexity-guided hyper-parameter optimization for flatter losses
This addresses hyper-parameter tuning for neural networks, but it appears incremental as it builds on existing flat minima theory.
The paper tackled hyper-parameter optimization by linking strong convexity to loss flatness for better generalization, and showed that their method achieves strong performance with reduced runtime on 14 classification datasets.
We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the strong convexity of the loss and its flatness. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the strong convexity of the loss. By using the structure of the underlying neural network, we derive closed-form equations to approximate the strong convexity parameter, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.