SA-GD: Improved Gradient Descent Learning Strategy with Simulated Annealing
This addresses optimization challenges in deep learning for researchers and practitioners, but it is incremental as it combines existing simulated annealing with gradient descent.
The paper tackles the problem of gradient descent getting trapped in local minima and saddle points in non-convex optimization, such as in deep learning, by proposing SA-GD, which integrates simulated annealing to help models escape local areas. The result shows that SA-GD improves generalization ability on CNN models across benchmark datasets without compromising convergence efficiency and stability, and it also serves as an effective ensemble learning approach to significantly boost performance.
Gradient descent algorithm is the most utilized method when optimizing machine learning issues. However, there exists many local minimums and saddle points in the loss function, especially for high dimensional non-convex optimization problems like deep learning. Gradient descent may make loss function trapped in these local intervals which impedes further optimization, resulting in poor generalization ability. This paper proposes the SA-GD algorithm which introduces the thought of simulated annealing algorithm to gradient descent. SA-GD method offers model the ability of mounting hills in probability, tending to enable the model to jump out of these local areas and converge to a optimal state finally. We took CNN models as an example and tested the basic CNN models on various benchmark datasets. Compared to the baseline models with traditional gradient descent algorithm, models with SA-GD algorithm possess better generalization ability without sacrificing the efficiency and stability of model convergence. In addition, SA-GD can be utilized as an effective ensemble learning approach which improves the final performance significantly.