Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
This provides a rigorous method for improving optimization in non-convex settings, such as deep learning, though it is incremental relative to prior saddle-escape work.
The paper tackles the problem of escaping saddle points in non-convex optimization by proposing the Perturbed Saddle-escape Descent (PSD) algorithm, achieving an (ε,√ρε)-approximate second-order stationary point with high probability using O(ℓΔ_f/ε²) gradient evaluations for descent plus O((ℓ/√ρε)log(d/δ)) per escape episode.
We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization. Our main contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully explicit constants and a rigorous separation between gradient-descent and saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with $\ell$-Lipschitz gradient and $ρ$-Lipschitz Hessian, we prove that PSD finds an $(ε,\sqrt{ρε})$-approximate second-order stationary point with high probability using at most $O(\ellΔ_f/ε^2)$ gradient evaluations for the descent phase plus $O((\ell/\sqrt{ρε})\log(d/δ))$ evaluations per escape episode, with at most $O(\ellΔ_f/ε^2)$ episodes needed. We validate our theoretical predictions through extensive experiments across both synthetic functions and practical machine learning tasks, confirming the logarithmic dimension dependence and the predicted per-episode function decrease. We also provide complete algorithmic specifications including a finite-difference variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch sizing.