Zhaosong Lu

h-index32

34papers

1,712citations

Novelty53%

AI Score54

Ranked #11,375 of 194,257 authors (top 6%)#27 in OC (top 3%)

34 Papers

2.5OCMay 29, 2012

Penalty Decomposition Methods for Rank Minimization

Zhaosong Lu, Yong Zhang

In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first establish that a class of special rank minimization problems has closed-form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems in which each subproblem is solved by a block coordinate descend method. Under some suitable assumptions, we show that any accumulation point of the sequence generated by the penalty decomposition methods satisfies the first-order optimality conditions of a nonlinear reformulation of the problems. Finally, we test the performance of our methods by applying them to the matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods are generally comparable or superior to the existing methods in terms of solution quality.

26.2OCJan 4, 2023

First-order penalty methods for bilevel optimization

Zhaosong Lu, Sanyou Mei

In this paper we study a class of unconstrained and constrained bilevel optimization problems in which the lower level is a possibly nonsmooth convex optimization problem, while the upper level is a possibly nonconvex optimization problem. We introduce a notion of $\varepsilon$-KKT solution for them and show that an $\varepsilon$-KKT solution leads to an $O(\sqrt{\varepsilon})$- or $O(\varepsilon)$-hypergradient based stionary point under suitable assumptions. We also propose first-order penalty methods for finding an $\varepsilon$-KKT solution of them, whose subproblems turn out to be a structured minimax problem and can be suitably solved by a first-order method recently developed by the authors. Under suitable assumptions, an \emph{operation complexity} of $O(\varepsilon^{-4}\log\varepsilon^{-1})$ and $O(\varepsilon^{-7}\log\varepsilon^{-1})$, measured by their fundamental operations, is established for the proposed penalty methods for finding an $\varepsilon$-KKT solution of the unconstrained and constrained bilevel optimization problems, respectively. Preliminary numerical results are presented to illustrate the performance of our proposed methods. To the best of our knowledge, this paper is the first work to demonstrate that bilevel optimization can be approximately solved as minimax optimization, and moreover, it provides the first implementable method with complexity guarantees for such sophisticated bilevel optimization.

4.6OCMay 11, 2012

Penalty Decomposition Methods for $L0$-Norm Minimization

Zhaosong Lu, Yong Zhang

In this paper we consider general l0-norm minimization problems, that is, the problems with l0-norm appearing in either objective function or constraint. In particular, we first reformulate the l0-norm constrained problem as an equivalent rank minimization problem and then apply the penalty decomposition (PD) method proposed in [33] to solve the latter problem. By utilizing the special structures, we then transform all matrix operations of this method to vector operations and obtain a PD method that only involves vector operations. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD method satisfies a first-order optimality condition that is generally stronger than one natural optimality condition. We further extend the PD method to solve the problem with the l0-norm appearing in objective function. Finally, we test the performance of our PD methods by applying them to compressed sensing, sparse logistic regression and sparse inverse covariance selection. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.

11.1OCJan 9, 2023

A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees

Chuan He, Zhaosong Lu, Ting Kei Pong

In this paper we consider finding a second-order stationary point (SOSP) of nonconvex equality constrained optimization when a nearly feasible point is known. In particular, we first propose a new Newton-CG method for finding an approximate SOSP of unconstrained optimization and show that it enjoys a substantially better complexity than the Newton-CG method [56]. We then propose a Newton-CG based augmented Lagrangian (AL) method for finding an approximate SOSP of nonconvex equality constrained optimization, in which the proposed Newton-CG method is used as a subproblem solver. We show that under a generalized linear independence constraint qualification (GLICQ), our AL method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-7/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of nonconvex equality constrained optimization with high probability, which are significantly better than the ones achieved by the proximal AL method [60]. Besides, we show that it has a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ when the GLICQ does not hold. To the best of our knowledge, all the complexity results obtained in this paper are new for finding an approximate SOSP of nonconvex equality constrained optimization with high probability. Preliminary numerical results also demonstrate the superiority of our proposed methods over the ones in [56,60].

13.0OCJan 5, 2023

A first-order augmented Lagrangian method for constrained minimax optimization

Zhaosong Lu, Sanyou Mei

In this paper we study a class of constrained minimax problems. In particular, we propose a first-order augmented Lagrangian method for solving them, whose subproblems turn out to be a much simpler structured minimax problem and are suitably solved by a first-order method developed in this paper. Under some suitable assumptions, an \emph{operation complexity} of $O(\varepsilon^{-4}\log\varepsilon^{-1})$, measured by its fundamental operations, is established for the first-order augmented Lagrangian method for finding an $\varepsilon$-KKT solution of the constrained minimax problems.

10.3OCJun 2, 2022

Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient

Zhaosong Lu, Sanyou Mei

In this paper we develop accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient (LLCG), which is beyond the well-studied class of convex optimization with Lipschitz continuous gradient. In particular, we first consider unconstrained convex optimization with LLCG and propose accelerated proximal gradient (APG) methods for solving it. The proposed APG methods are equipped with a verifiable termination criterion and enjoy an operation complexity of ${\cal O}(\varepsilon^{-1/2}\log \varepsilon^{-1})$ and ${\cal O}(\log \varepsilon^{-1})$ for finding an $\varepsilon$-residual solution of an unconstrained convex and strongly convex optimization problem, respectively. We then consider constrained convex optimization with LLCG and propose an first-order proximal augmented Lagrangian method for solving it by applying one of our proposed APG methods to approximately solve a sequence of proximal augmented Lagrangian subproblems. The resulting method is equipped with a verifiable termination criterion and enjoys an operation complexity of ${\cal O}(\varepsilon^{-1}\log \varepsilon^{-1})$ and ${\cal O}(\varepsilon^{-1/2}\log \varepsilon^{-1})$ for finding an $\varepsilon$-KKT solution of a constrained convex and strongly convex optimization problem, respectively. All the proposed methods in this paper are parameter-free or almost parameter-free except that the knowledge on convexity parameter is required. In addition, preliminary numerical results are presented to demonstrate the performance of our proposed methods. To the best of our knowledge, no prior studies were conducted to investigate accelerated first-order methods with complexity guarantees for convex optimization with LLCG. All the complexity results obtained in this paper are new.

5.6OCJul 12, 2022

A Newton-CG based barrier method for finding a second-order stationary point of nonconvex conic optimization with complexity guarantees

Chuan He, Zhaosong Lu

In this paper we consider finding an approximate second-order stationary point (SOSP) of nonconvex conic optimization that minimizes a twice differentiable function over the intersection of an affine subspace and a convex cone. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier method for finding an $(ε,\sqrtε)$-SOSP of this problem. Our method is not only implementable, but also achieves an iteration complexity of ${\cal O}(ε^{-3/2})$, which matches the best known iteration complexity of second-order methods for finding an $(ε,\sqrtε)$-SOSP of unconstrained nonconvex optimization. The operation complexity, consisting of ${\cal O}(ε^{-3/2})$ Cholesky factorizations and $\widetilde{\cal O}(ε^{-3/2}\min\{n,ε^{-1/4}\})$ other fundamental operations, is also established for our method.

6.1OCJan 10, 2023

A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization

Chuan He, Heng Huang, Zhaosong Lu

In this paper we consider finding an approximate second-order stationary point (SOSP) of general nonconvex conic optimization that minimizes a twice differentiable function subject to nonlinear equality constraints and also a convex conic constraint. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier-augmented Lagrangian method for finding an approximate SOSP of this problem. Under some mild assumptions, we show that our method enjoys a total inner iteration complexity of $\widetilde{\cal O}(ε^{-11/2})$ and an operation complexity of $\widetilde{\cal O}(ε^{-11/2}\min\{n,ε^{-5/4}\})$ for finding an $(ε,\sqrtε)$-SOSP of general nonconvex conic optimization with high probability. Moreover, under a constraint qualification, these complexity bounds are improved to $\widetilde{\cal O}(ε^{-7/2})$ and $\widetilde{\cal O}(ε^{-7/2}\min\{n,ε^{-3/4}\})$, respectively. To the best of our knowledge, this is the first study on the complexity of finding an approximate SOSP of general nonconvex conic optimization. Preliminary numerical results are presented to demonstrate superiority of the proposed method over first-order methods in terms of solution quality.

17.8OCSep 16, 2024

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

Zhaosong Lu, Sanyou Mei, Yifeng Xiao

In this paper, we study a class of deterministically constrained stochastic optimization problems. Existing methods typically aim to find an $ε$-stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed accuracy $ε$. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $ε$-stochastic stationary point potentially undesirable due to the risk of significant constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter $θ\geq 1$ and other suitable assumptions, we establish that these methods respectively achieve a sample and first-order operation complexity of $\widetilde O(ε^{-\max\{θ+2, 2θ\}})$ and $\widetilde O(ε^{-\max\{4, 2θ\}})$ for finding a stronger $ε$-stochastic stationary point, where the constraint violation is within $ε$ with certainty, and the expected violation of first-order stationarity is within $ε$. For $θ=1$, these complexities reduce to $\widetilde O(ε^{-3})$ and $\widetilde O(ε^{-4})$ respectively, which match, up to a logarithmic factor, the best-known complexities achieved by existing methods for finding an $ε$-stochastic stationary point of unconstrained smooth stochastic optimization problems.

4.0OCJun 2, 2022

Primal-dual extrapolation methods for monotone inclusions under local Lipschitz continuity

Zhaosong Lu, Sanyou Mei

In this paper we consider a class of monotone inclusion (MI) problems of finding a zero of the sum of two monotone operators, in which one operator is maximal monotone while the other is {\it locally Lipschitz} continuous. We propose primal-dual extrapolation methods to solve them using a point and operator extrapolation technique, whose parameters are chosen by a backtracking line search scheme. The proposed methods enjoy an operation complexity of ${\cal O}(\log ε^{-1})$ and ${\cal O}(ε^{-1}\log ε^{-1})$, measured by the number of fundamental operations consisting only of evaluations of one operator and resolvent of the other operator, for finding an $\varepsilon$-residual solution of strongly and non-strongly MI problems, respectively. The latter complexity significantly improves the previously best operation complexity ${\cal O}(\varepsilon^{-2})$. As a byproduct, complexity results of the primal-dual extrapolation methods are also obtained for finding an $\varepsilon$-KKT or $\varepsilon$-residual solution of convex conic optimization, conic constrained saddle point, and variational inequality problems under {\it local Lipschitz} continuity. We provide preliminary numerical results to demonstrate the performance of the proposed methods.

7.1OCDec 28, 2025

A first-order method for nonconvex-strongly-concave constrained minimax optimization

Zhaosong Lu, Sanyou Mei

In this paper we study a nonconvex-strongly-concave constrained minimax problem. Specifically, we propose a first-order augmented Lagrangian method for solving it, whose subproblems are nonconvex-strongly-concave unconstrained minimax problems and suitably solved by a first-order method developed in this paper that leverages the strong concavity structure. Under suitable assumptions, the proposed method achieves an operation complexity of $O(\varepsilon^{-3.5}\log\varepsilon^{-1})$, measured in terms of its fundamental operations, for finding an $\varepsilon$-KKT solution of the constrained minimax problem, which improves the previous best-known operation complexity by a factor of $\varepsilon^{-0.5}$.

13.0OCNov 10, 2025

Solving bilevel optimization via sequential minimax optimization

Zhaosong Lu, Sanyou Mei

In this paper we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower-level part is a possibly nonsmooth convex optimization problem, while the upper-level part is a possibly nonconvex optimization problem. Specifically, SMO applies a first-order method to solve a sequence of minimax subproblems, which are obtained by employing a hybrid of modified augmented Lagrangian and penalty schemes on the bilevel optimization problems. Under suitable assumptions, we establish an operation complexity of $O(\varepsilon^{-7}\log\varepsilon^{-1})$ and $O(\varepsilon^{-6}\log\varepsilon^{-1})$, measured in terms of fundamental operations, for SMO in finding an $\varepsilon$-KKT solution of the bilevel optimization problems with merely convex and strongly convex lower-level objective functions, respectively. The latter result improves the previous best-known operation complexity by a factor of $\varepsilon^{-1}$. Preliminary numerical results demonstrate significantly superior computational performance compared to the recently developed first-order penalty method.

22.0OCJun 12, 2025

Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise

Chuan He, Zhaosong Lu, Defeng Sun et al.

In this paper, we propose practical normalized stochastic first-order methods with Polyak momentum, multi-extrapolated momentum, and recursive momentum for solving unconstrained optimization problems. These methods employ dynamically updated algorithmic parameters and do not require explicit knowledge of problem-dependent quantities such as the Lipschitz constant or noise bound. We establish first-order oracle complexity results for finding approximate stochastic stationary points under heavy-tailed noise and weakly average smoothness conditions -- both of which are weaker than the commonly used bounded variance and mean-squared smoothness assumptions. Our complexity bounds either improve upon or match the best-known results in the literature. Numerical experiments are presented to demonstrate the practical effectiveness of the proposed methods.

22.6LGSep 15, 2025

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training

Chuan He, Zhanwang Deng, Zhaosong Lu

Neural network (NN) training is inherently a large-scale matrix optimization problem, yet the matrix structure of NN parameters has long been overlooked. Recently, the optimizer Muon \cite{jordanmuon}, which explicitly exploits this structure, has gained significant attention for its strong performance in foundation model training. A key component contributing to Muon's success is matrix orthogonalization. In this paper, we propose {\it low-rank orthogonalization}, which explicitly leverages the low-rank nature of gradients during NN training. Building on this, we propose low-rank matrix-signed gradient descent and a low-rank variant of Muon. Our numerical experiments demonstrate the superior performance of low-rank orthogonalization, with the low-rank Muon achieving promising results in GPT-2 and LLaMA pretraining -- surpassing the performance of the carefully tuned vanilla Muon. Theoretically, we establish the iteration complexity of the low-rank matrix-signed gradient descent for finding an approximate stationary solution, as well as that of low-rank Muon for finding an approximate stochastic stationary solution under heavy-tailed noise.

7.1OCJul 2, 2025

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Łojasiewicz condition

Zhaosong Lu, Xiangyuan Wang

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Łojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Łojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally Hölder smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

7.1OCJun 25, 2025

First-order methods for stochastic and finite-sum convex optimization with deterministic constraints

Zhaosong Lu, Yifeng Xiao

In this paper, we study a class of stochastic and finite-sum convex optimization problems with deterministic constraints. Existing methods typically aim to find an $ε$-$expectedly\ feasible\ stochastic\ optimal$ solution, in which the expected constraint violation and expected optimality gap are both within a prescribed tolerance $ε$. However, in many practical applications, constraints must be nearly satisfied with certainty, rendering such solutions potentially unsuitable due to the risk of substantial violations. To address this issue, we propose stochastic first-order methods for finding an $ε$-$surely\ feasible\ stochastic\ optimal$ ($ε$-SFSO) solution, where the constraint violation is deterministically bounded by $ε$ and the expected optimality gap is at most $ε$. Our methods apply an accelerated stochastic gradient (ASG) scheme or a modified variance-reduced ASG scheme $only\ once$ to a sequence of quadratic penalty subproblems with appropriately chosen penalty parameters. We establish first-order oracle complexity bounds for the proposed methods in computing an $ε$-SFSO solution. As a byproduct, we also derive first-order oracle complexity results for sample average approximation method in computing an $ε$-SFSO solution of the stochastic optimization problem using our proposed methods to solve the sample average problem.

13.0OCMar 29, 2025

Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization

Yufeng Yang, Yi Zhou, Zhaosong Lu

Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic optimization, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.

4.1OCOct 1, 2025

A first-order method for constrained nonconvex--nonconcave minimax problems under a local Kurdyka-Łojasiewicz condition

Zhaosong Lu, Xiangyuan Wang

We study a class of constrained nonconvex--nonconcave minimax problems in which the inner maximization involves potentially complex constraints. Under the assumption that the inner problem of a novel lifted minimax problem satisfies a local Kurdyka-Łojasiewicz (KL) condition, we show that the maximal function of the original problem enjoys a local Hölder smoothness property. We also propose a sequential convex programming (SCP) method for solving constrained optimization problems and establish its convergence rate under a local KL condition. Leveraging these results, we develop an inexact proximal gradient method for the original minimax problem, where the inexact gradient of the maximal function is computed via the SCP method applied to a locally KL-structured subproblem. Finally, we establish complexity guarantees for the proposed method in computing an approximate stationary point of the original minimax problem.

11.3OCOct 10, 2016

Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization

Feihu Huang, Songcan Chen, Zhaosong Lu

In the paper, we study the stochastic alternating direction method of multipliers (ADMM) for the nonconvex optimizations, and propose three classes of the nonconvex stochastic ADMM with variance reduction, based on different reduced variance stochastic gradients. Specifically, the first class called the nonconvex stochastic variance reduced gradient ADMM (SVRG-ADMM), uses a multi-stage scheme to progressively reduce the variance of stochastic gradients. The second is the nonconvex stochastic average gradient ADMM (SAG-ADMM), which additionally uses the old gradients estimated in the previous iteration. The third called SAGA-ADMM is an extension of the SAG-ADMM method. Moreover, under some mild conditions, we establish the iteration complexity bound of $O(1/ε)$ of the proposed methods to obtain an $ε$-stationary solution of the nonconvex optimizations. In particular, we provide a general framework to analyze the iteration complexity of these nonconvex stochastic ADMM methods with variance reduction. Finally, some numerical experiments demonstrate the effectiveness of our methods.

2.9OCJul 1, 2016

Randomized block proximal damped Newton method for composite self-concordant minimization

Zhaosong Lu

In this paper we consider the composite self-concordant (CSC) minimization problem, which minimizes the sum of a self-concordant function $f$ and a (possibly nonsmooth) proper closed convex function $g$. The CSC minimization is the cornerstone of the path-following interior point methods for solving a broad class of convex optimization problems. It has also found numerous applications in machine learning. The proximal damped Newton (PDN) methods have been well studied in the literature for solving this problem that enjoy a nice iteration complexity. Given that at each iteration these methods typically require evaluating or accessing the Hessian of $f$ and also need to solve a proximal Newton subproblem, the cost per iteration can be prohibitively high when applied to large-scale problems. Inspired by the recent success of block coordinate descent methods, we propose a randomized block proximal damped Newton (RBPDN) method for solving the CSC minimization. Compared to the PDN methods, the computational cost per iteration of RBPDN is usually significantly lower. The computational experiment on a class of regularized logistic regression problems demonstrate that RBPDN is indeed promising in solving large-scale CSC minimization problems. The convergence of RBPDN is also analyzed in the paper. In particular, we show that RBPDN is globally convergent when $g$ is Lipschitz continuous. It is also shown that RBPDN enjoys a local linear convergence. Moreover, we show that for a class of $g$ including the case where $g$ is Lipschitz differentiable, RBPDN enjoys a global linear convergence. As a striking consequence, it shows that the classical damped Newton methods [22,40] and the PDN [31] for such $g$ are globally linearly convergent, which was previously unknown in the literature. Moreover, this result can be used to sharpen the existing iteration complexity of these methods.

5.8OCNov 24, 2015

Generalized Conjugate Gradient Methods for $\ell_1$ Regularized Convex Quadratic Programming with Finite Convergence

Zhaosong Lu, Xiaojun Chen

The conjugate gradient (CG) method is an efficient iterative method for solving large-scale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the $\ell_1$-regularized (possibly not strongly) convex QP that terminate at an optimal solution in a finite number of iterations. At each iteration, our methods first identify a face of an orthant and then either perform an exact line search along the direction of the negative projected minimum-norm subgradient of the objective function or execute a CG subroutine that conducts a sequence of CG iterations until a CG iterate crosses the boundary of this face or an approximate minimizer of over this face or a subface is found. We determine which type of step should be taken by comparing the magnitude of some components of the minimum-norm subgradient of the objective function to that of its rest components. Our analysis on finite convergence of these methods makes use of an error bound result and some key properties of the aforementioned exact line search and the CG subroutine. We also show that the proposed methods are capable of finding an approximate solution of the problem by allowing some inexactness on the execution of the CG subroutine. The overall arithmetic operation cost of our GCG methods for finding an $ε$-optimal solution depends on $ε$ in $O(\log(1/ε))$, which is superior to the accelerated proximal gradient method [2,23] that depends on $ε$ in $O(1/\sqrtε)$. In addition, our GCG methods can be extended straightforwardly to solve box-constrained convex QP with finite convergence. Numerical results demonstrate that our methods are very favorable for solving ill-conditioned problems.

7.9OCNov 23, 2015

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Zhaosong Lu, Xiaorui Li

In the context of sparse recovery, it is known that most of existing regularizers such as $\ell_1$ suffer from some bias incurred by some leading entries (in magnitude) of the associated vector. To neutralize this bias, we propose a class of models with partial regularizers for recovering a sparse solution of a linear system. We show that every local minimizer of these models is sufficiently sparse or the magnitude of all its nonzero entries is above a uniform constant depending only on the data of the linear system. Moreover, for a class of partial regularizers, any global minimizer of these models is a sparsest solution to the linear system. We also establish some sufficient conditions for local or global recovery of the sparsest solution to the linear system, among which one of the conditions is weaker than the best known restricted isometry property (RIP) condition for sparse recovery by $\ell_1$. In addition, a first-order feasible augmented Lagrangian (FAL) method is proposed for solving these models, in which each subproblem is solved by a nonmonotone proximal gradient (NPG) method. Despite the complication of the partial regularizers, we show that each proximal subproblem in NPG can be solved as a certain number of one-dimensional optimization problems, which usually have a closed-form solution. We also show that any accumulation point of the sequence generated by FAL is a first-order stationary point of the models. Numerical results on compressed sensing and sparse logistic regression demonstrate that the proposed models substantially outperform the widely used ones in the literature in terms of solution quality.

5.8OCSep 29, 2015

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method

Zhaosong Lu

We consider the problem of minimizing a Lipschitz differentiable function over a class of sparse symmetric sets that has wide applications in engineering and science. For this problem, it is known that any accumulation point of the classical projected gradient (PG) method with a constant stepsize $1/L$ satisfies the $L$-stationarity optimality condition that was introduced in [3]. In this paper we introduce a new optimality condition that is stronger than the $L$-stationarity optimality condition. We also propose a nonmonotone projected gradient (NPG) method for this problem by incorporating some support-changing and coordintate-swapping strategies into a projected gradient method with variable stepsizes. It is shown that any accumulation point of NPG satisfies the new optimality condition and moreover it is a coordinatewise stationary point. Under some suitable assumptions, we further show that it is a global or a local minimizer of the problem. Numerical experiments are conducted to compare the performance of PG and NPG. The computational results demonstrate that NPG has substantially better solution quality than PG, and moreover, it is at least comparable to, but sometimes can be much faster than PG in terms of speed.

12.7OCSep 9, 2014

Penalty methods for a class of non-Lipschitz optimization problems

Xiaojun Chen, Zhaosong Lu, Ting Kei Pong

We consider a class of constrained optimization problems with a possibly nonconvex non-Lipschitz objective and a convex feasible set being the intersection of a polyhedron and a possibly degenerate ellipsoid. Such problems have a wide range of applications in data science, where the objective is used for inducing sparsity in the solutions while the constraint set models the noise tolerance and incorporates other prior information for data fitting. To solve this class of constrained optimization problems, a common approach is the penalty method. However, there is little theory on exact penalization for problems with nonconvex and non-Lipschitz objective functions. In this paper, we study the existence of exact penalty parameters regarding local minimizers, stationary points and $ε$-minimizers under suitable assumptions. Moreover, we discuss a penalty method whose subproblems are solved via a nonmonotone proximal gradient method with a suitable update scheme for the penalty parameters, and prove the convergence of the algorithm to a KKT point of the constrained problem. Preliminary numerical results demonstrate the efficiency of the penalty method for finding sparse solutions of underdetermined linear systems.

10.6LGApr 4, 2014

Orthogonal Rank-One Matrix Pursuit for Low Rank Matrix Completion

Zheng Wang, Ming-Jun Lai, Zhaosong Lu et al.

In this paper, we propose an efficient and scalable low rank matrix completion algorithm. The key idea is to extend orthogonal matching pursuit method from the vector case to the matrix case. We further propose an economic version of our algorithm by introducing a novel weight updating rule to reduce the time and storage complexity. Both versions are computationally inexpensive for each matrix pursuit iteration, and find satisfactory results in a few iterations. Another advantage of our proposed algorithm is that it has only one tunable parameter, which is the rank. It is easy to understand and to use by the user. This becomes especially important in large-scale learning problems. In addition, we rigorously show that both versions achieve a linear convergence rate, which is significantly better than the previous known results. We also empirically compare the proposed algorithms with several state-of-the-art matrix completion algorithms on many real-world datasets, including the large-scale recommendation dataset Netflix as well as the MovieLens datasets. Numerical results show that our proposed algorithm is more efficient than competing algorithms while achieving similar or better prediction performance.

9.8OCJan 5, 2014

Schatten-$p$ Quasi-Norm Regularized Matrix Optimization via Iterative Reweighted Singular Value Minimization

Zhaosong Lu, Yong Zhang

In this paper we study general Schatten-$p$ quasi-norm (SPQN) regularized matrix minimization problems. In particular, we first introduce a class of first-order stationary points for them, and show that the first-order stationary points introduced in [11] for an SPQN regularized $vector$ minimization problem are equivalent to those of an SPQN regularized $matrix$ minimization reformulation. We also show that any local minimizer of the SPQN regularized matrix minimization problems must be a first-order stationary point. Moreover, we derive lower bounds for nonzero singular values of the first-order stationary points and hence also of the local minimizers of the SPQN regularized matrix minimization problems. The iterative reweighted singular value minimization (IRSVM) methods are then proposed to solve these problems, whose subproblems are shown to have a closed-form solution. In contrast to the analogous methods for the SPQN regularized $vector$ minimization problems, the convergence analysis of these methods is significantly more challenging. We develop a novel approach to establishing the convergence of these methods, which makes use of the expression of a specific solution of their subproblems and avoids the intricate issue of finding the explicit expression for the Clarke subdifferential of the objective of their subproblems. In particular, we show that any accumulation point of the sequence generated by the IRSVM methods is a first-order stationary point of the problems. Our computational results demonstrate that the IRSVM methods generally outperform some recently developed state-of-the-art methods in terms of solution quality and/or speed.

13.9OCJun 25, 2013

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

Zhaosong Lu, Lin Xiao

We propose a randomized nonmonotone block proximal gradient (RNBPG) method for minimizing the sum of a smooth (possibly nonconvex) function and a block-separable (possibly nonconvex nonsmooth) function. At each iteration, this method randomly picks a block according to any prescribed probability distribution and solves typically several associated proximal subproblems that usually have a closed-form solution, until a certain progress on objective value is achieved. In contrast to the usual randomized block coordinate descent method [23,20], our method has a nonmonotone flavor and uses variable stepsizes that can partially utilize the local curvature information of the smooth component of objective function. We show that any accumulation point of the solution sequence of the method is a stationary point of the problem {\it almost surely} and the method is capable of finding an approximate stationary point with high probability. We also establish a sublinear rate of convergence for the method in terms of the minimal expected squared norm of certain proximal gradients over the iterations. When the problem under consideration is convex, we show that the expected objective values generated by RNBPG converge to the optimal value of the problem. Under some assumptions, we further establish a sublinear and linear rate of convergence on the expected objective values generated by a monotone version of RNBPG. Finally, we conduct some preliminary experiments to test the performance of RNBPG on the $\ell_1$-regularized least-squares problem and a dual SVM problem in machine learning. The computational results demonstrate that our method substantially outperforms the randomized block coordinate {\it descent} method with fixed or variable stepsizes.

30.5OCMay 21, 2013

On the Complexity Analysis of Randomized Block-Coordinate Descent Methods

Zhaosong Lu, Lin Xiao

In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in [8,11] for minimizing the sum of a smooth convex function and a block-separable convex function. In particular, we extend Nesterov's technique developed in [8] for analyzing the RBCD method for minimizing a smooth convex function over a block-separable closed convex set to the aforementioned more general problem and obtain a sharper expected-value type of convergence rate than the one implied in [11]. Also, we obtain a better high-probability type of iteration complexity, which improves upon the one in [11] by at least the amount $O(n/ε)$, where $ε$ is the target solution accuracy and $n$ is the number of problem blocks. In addition, for unconstrained smooth convex minimization, we develop a new technique called {\it randomized estimate sequence} to analyze the accelerated RBCD method proposed by Nesterov [11] and establish a sharper expected-value type of convergence rate than the one given in [11].

26.5LGMar 18, 2013

A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems

Pinghua Gong, Changshui Zhang, Zhaosong Lu et al.

Non-convex sparsity-inducing penalties have recently received considerable attentions in sparse learning. Recent theoretical investigations have demonstrated their superiority over the convex counterparts in several sparse learning settings. However, solving the non-convex optimization problems associated with non-convex penalties remains a big challenge. A commonly used approach is the Multi-Stage (MS) convex relaxation (or DC programming), which relaxes the original non-convex problem to a sequence of convex problems. This approach is usually not very practical for large-scale problems because its computational cost is a multiple of solving a single convex problem. In this paper, we propose a General Iterative Shrinkage and Thresholding (GIST) algorithm to solve the nonconvex optimization problem for a large class of non-convex penalties. The GIST algorithm iteratively solves a proximal operator problem, which in turn has a closed-form solution for many commonly used penalties. At each outer iteration of the algorithm, we use a line search initialized by the Barzilai-Borwein (BB) rule that allows finding an appropriate step size quickly. The paper also presents a detailed convergence analysis of the GIST algorithm. The efficiency of the proposed algorithm is demonstrated by extensive experiments on large-scale data sets.

4.6OCOct 31, 2012

Iterative Hard Thresholding Methods for $l_0$ Regularized Convex Cone Programming

Zhaosong Lu

In this paper we consider $l_0$ regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving $l_0$ regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an $ε$-local-optimal solution. We then propose a method for solving $l_0$ regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an $ε$-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem.

9.2OCOct 10, 2012

Sequential Convex Programming Methods for A Class of Structured Nonlinear Programming

Zhaosong Lu

In this paper we study a broad class of structured nonlinear programming (SNLP) problems. In particular, we first establish the first-order optimality conditions for them. Then we propose sequential convex programming (SCP) methods for solving them in which each iteration is obtained by solving a convex programming problem. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the methods is a KKT point of the SNLP problems. In addition, we propose a variant of the SCP method for SNLP in which nonmonotone scheme and ``local'' Lipschitz constants of the associated functions are used. A similar convergence result as mentioned above is established.

15.6OCSep 29, 2012

Iterative Reweighted Minimization Methods for $l_p$ Regularized Unconstrained Nonlinear Programming

Zhaosong Lu

In this paper we study general $l_p$ regularized unconstrained minimization problems. In particular, we derive lower bounds for nonzero entries of first- and second-order stationary points, and hence also of local minimizers of the $l_p$ minimization problems. We extend some existing iterative reweighted $l_1$ (IRL1) and $l_2$ (IRL2) minimization methods to solve these problems and proposed new variants for them in which each subproblem has a closed form solution. Also, we provide a unified convergence analysis for these methods. In addition, we propose a novel Lipschitz continuous $ε$-approximation to $\|x\|^p_p$. Using this result, we develop new IRL1 methods for the $l_p$ minimization problems and showed that any accumulation point of the sequence generated by these methods is a first-order stationary point, provided that the approximation parameter $ε$ is below a computable threshold value. This is a remarkable result since all existing iterative reweighted minimization methods require that $ε$ be dynamically updated and approach zero. Our computational results demonstrate that the new IRL1 method is generally more stable than the existing IRL1 methods [21,18] in terms of objective function value and CPU time.

16.2LGSep 10, 2012

Fused Multiple Graphical Lasso

Sen Yang, Zhaosong Lu, Xiaotong Shen et al.

In this paper, we consider the problem of estimating multiple graphical models simultaneously using the fused lasso penalty, which encourages adjacent graphs to share similar structures. A motivating example is the analysis of brain networks of Alzheimer's disease using neuroimaging data. Specifically, we may wish to estimate a brain network for the normal controls (NC), a brain network for the patients with mild cognitive impairment (MCI), and a brain network for Alzheimer's patients (AD). We expect the two brain networks for NC and MCI to share common structures but not to be identical to each other; similarly for the two brain networks for MCI and AD. The proposed formulation can be solved using a second-order method. Our key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which decomposes the large graphs into small subgraphs and allows an efficient estimation of multiple independent (small) subgraphs, dramatically reducing the computational cost. We perform experiments on both synthetic and real data; our results demonstrate the effectiveness and efficiency of the proposed approach.

16.2LGMay 10, 2012

Sparse Approximation via Penalty Decomposition Methods

Zhaosong Lu, Yong Zhang

In this paper we consider sparse approximation problems, that is, general $l_0$ minimization problems with the $l_0$-"norm" of a vector being a part of constraints or objective function. In particular, we first study the first-order optimality conditions for these problems. We then propose penalty decomposition (PD) methods for solving them in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD methods satisfies the first-order optimality conditions of the problems. Furthermore, for the problems in which the $l_0$ part is the only nonconvex part, we show that such an accumulation point is a local minimizer of the problems. In addition, we show that any accumulation point of the sequence generated by the BCD method is a saddle point of the penalty subproblem. Moreover, for the problems in which the $l_0$ part is the only nonconvex part, we establish that such an accumulation point is a local minimizer of the penalty subproblem. Finally, we test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.