Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
This work addresses the challenge of nonconvex optimization in high-dimensional statistics, offering theoretical guarantees for practitioners using methods like SCAD or MCP, though it is incremental in extending existing convex theory to nonconvex settings.
The paper tackles the problem of analyzing local optima in regularized M-estimators with nonconvex loss and penalty functions, proving that any stationary point lies within statistical precision of the true parameter vector and providing error bounds for various metrics.
We provide novel theoretical results regarding local optima of regularized $M$-estimators, allowing for nonconvexity in both loss and penalty functions. Under restricted strong convexity on the loss and suitable regularity conditions on the penalty, we prove that \emph{any stationary point} of the composite objective function will lie within statistical precision of the underlying parameter vector. Our theory covers many nonconvex objective functions of interest, including the corrected Lasso for errors-in-variables linear models; regression for generalized linear models with nonconvex penalties such as SCAD, MCP, and capped-$\ell_1$; and high-dimensional graphical model estimation. We quantify statistical accuracy by providing bounds on the $\ell_1$-, $\ell_2$-, and prediction error between stationary points and the population-level optimum. We also propose a simple modification of composite gradient descent that may be used to obtain a near-global optimum within statistical precision $ε$ in $\log(1/ε)$ steps, which is the fastest possible rate of any first-order method. We provide simulation studies illustrating the sharpness of our theoretical results.