Local Saddle Point Optimization: A Curvature Exploitation Approach
This addresses a systemic issue in optimization for machine learning, offering a solution to improve convergence in saddle point problems, though it appears incremental as it builds on existing gradient methods.
The paper tackles the problem of gradient-based optimization getting stuck at non-optimal stationary points in saddle point problems by proposing a curvature exploitation approach to escape them, proving effectiveness for methods like gradient descent and Adagrad and showing empirical advantages on common benchmarks.
Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems. Here, we highlight a systemic issue of gradient dynamics that arise for saddle point problems, namely the presence of undesired stable stationary points that are no local optima. We propose a novel optimization approach that exploits curvature information in order to escape from these undesired stationary points. We prove that different optimization methods, including gradient method and Adagrad, equipped with curvature exploitation can escape non-optimal stationary points. We also provide empirical results on common saddle point problems which confirm the advantage of using curvature exploitation.