Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization
This addresses a gap in minimax optimization for nonconvex settings, providing the first non-asymptotic convergence guarantees for second-order stationary points, which is important for applications like adversarial training and game theory.
The paper tackles the problem of finding second-order stationary points in nonconvex-strongly-concave minimax optimization, proposing the Minimax Cubic Newton (MCN) method that achieves an approximate second-order stationary point with oracle complexities of O(κ^1.5√ρ ε^{-1.5}) for second-order oracles and Õ(κ^2√ρ ε^{-1.5}) for first-order oracles, and an inexact variant reduces Hessian-vector calls to Õ(κ^1.5ℓ ε^{-2}).
We study the smooth minimax optimization problem $\min_{\bf x}\max_{\bf y} f({\bf x},{\bf y})$, where $f$ is $\ell$-smooth, strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. Most of existing works focus on finding the first-order stationary points of the function $f({\bf x},{\bf y})$ or its primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x},{\bf y})$, but few of them focus on achieving second-order stationary points. In this paper, we propose a novel approach for minimax optimization, called Minimax Cubic Newton (MCN), which could find an $\big(\varepsilon,κ^{1.5}\sqrt{ρ\varepsilon}\,\big)$-second-order stationary point of $P({\bf x})$ with calling ${\mathcal O}\big(κ^{1.5}\sqrtρ\varepsilon^{-1.5}\big)$ times of second-order oracles and $\tilde{\mathcal O}\big(κ^{2}\sqrtρ\varepsilon^{-1.5}\big)$ times of first-order oracles, where $κ$ is the condition number and $ρ$ is the Lipschitz continuous constant for the Hessian of $f({\bf x},{\bf y})$. In addition, we propose an inexact variant of MCN for high-dimensional problems to avoid calling expensive second-order oracles. Instead, our method solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains the desired approximate second-order stationary point with high probability but only requires $\tilde{\mathcal O}\big(κ^{1.5}\ell\varepsilon^{-2}\big)$ Hessian-vector oracle calls and $\tilde{\mathcal O}\big(κ^{2}\sqrtρ\varepsilon^{-1.5}\big)$ first-order oracle calls. To the best of our knowledge, this is the first work that considers the non-asymptotic convergence behavior of finding second-order stationary points for minimax problems without the convex-concave assumptions.