Accelerated Gradient Methods for Sparse Statistical Learning with Nonconvex Penalties
This work addresses convergence issues in high-dimensional sparse learning for researchers and practitioners, though it is incremental as it builds on existing nonconvex accelerated gradient methods.
The authors tackled the problem of slow convergence in accelerated gradient methods for sparse statistical learning with nonconvex penalties by proposing a hyperparameter setting based on complexity bounds, resulting in faster convergence and improved signal recovery compared to state-of-the-art methods.
Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprising two components: a convex loss and a penalty function. While AG methods perform well for convex penalties, such as the LASSO, convergence issues may arise when it is applied to nonconvex penalties, such as SCAD. A recent proposal generalizes Nesterov's AG method to the nonconvex setting. The proposed algorithm requires specification of several hyperparameters for its practical application. Aside from some general conditions, there is no explicit rule for selecting the hyperparameters, and how different selection can affect convergence of the algorithm. In this article, we propose a hyperparameter setting based on the complexity upper bound to accelerate convergence, and consider the application of this nonconvex AG algorithm to high-dimensional linear and logistic sparse learning problems. We further establish the rate of convergence and present a simple and useful bound to characterize our proposed optimal damping sequence. Simulation studies show that convergence can be made, on average, considerably faster than that of the conventional proximal gradient algorithm. Our experiments also show that the proposed method generally outperforms the current state-of-the-art methods in terms of signal recovery.