Shuzhong Zhang

h-index8

14papers

695citations

Novelty55%

AI Score42

Ranked #83,333 of 205,806 authors (top 40%)#260 in OC (top 30%)

14 Papers

OCNov 6, 2016

Tensor and Its Tucker Core: the Invariance Relationships

Bo Jiang, Fan Yang, Shuzhong Zhang

In [13], Hillar and Lim famously demonstrated that "multilinear (tensor) analogues of many efficiently computable problems in numerical linear algebra are NP-hard". Despite many recent advancements, the state-of-the-art methods for computing such `tensor analogues' still suffer severely from the curse of dimensionality. In this paper we show that the Tucker core of a tensor however, retains many properties of the original tensor, including the CP rank, the border rank, the tensor Schatten quasi norms, and the Z-eigenvalues. When the core tensor is smaller than the original tensor, this property leads to considerable computational advantages as confirmed by our numerical experiments. In our analysis, we in fact work with a generalized Tucker-like decomposition that can accommodate any full column-rank factor matrices.

OCAug 30, 2023

A Unified Analysis on the Subgradient Upper Bounds for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions

Daoli Zhu, Lei Zhao, Shuzhong Zhang

This paper presents a unified analysis for the proximal subgradient method (Prox-SubGrad) type approach to minimize an overall objective of $f(x)+r(x)$, subject to convex constraints, where both $f$ and $r$ are weakly convex, nonsmooth, and non-Lipschitz. Leveraging on the properties of the Moreau envelope of weakly convex functions, we are able to relate error-bound conditions, the growth conditions of the subgradients of the objective, and the behavior of the proximal subgradient iterates on some remarkably broad classes of objective functions. Various existing as well as new bounding conditions are studied, leading to novel iteration complexity results. The terrain of our exploration expands to stochastic proximal subgradient algorithms.

OCJun 2, 2023

An Augmented Lagrangian Approach to Conically Constrained Non-monotone Variational Inequality Problems

Lei Zhao, Daoli Zhu, Shuzhong Zhang

In this paper we consider a non-monotone (mixed) variational inequality model with (nonlinear) convex conic constraints. Through developing an equivalent Lagrangian function-like primal-dual saddle-point system for the VI model in question, we introduce an augmented Lagrangian primal-dual method, to be called ALAVI in the current paper, for solving a general constrained VI model. Under an assumption, to be called the primal-dual variational coherence condition in the paper, we prove the convergence of ALAVI. Next, we show that many existing generalized monotonicity properties are sufficient -- though by no means necessary -- to imply the above mentioned coherence condition, thus are sufficient to ensure convergence of ALAVI. Under that assumption, we further show that ALAVI has in fact an $o(1/\sqrt{k})$ global rate of convergence where $k$ is the iteration count. By introducing a new gap function, this rate further improves to be $O(1/k)$ if the mapping is monotone. Finally, we show that under a metric subregularity condition, even if the VI model may be non-monotone the local convergence rate of ALAVI improves to be linear. Numerical experiments on some randomly generated highly nonlinear and non-monotone VI problems show practical efficacy of the newly proposed method.

OCSep 2, 2022

Cubic-Regularized Newton for Spectral Constrained Matrix Optimization and its Application to Fairness

Casey Garner, Gilad Lerman, Shuzhong Zhang

Matrix functions are utilized to rewrite smooth spectral constrained matrix optimization problems as smooth unconstrained problems over the set of symmetric matrices which are then solved via the cubic-regularized Newton method. A second-order chain rule identity for matrix functions is proven to compute the higher-order derivatives to implement cubic-regularized Newton, and a new convergence analysis is provided for cubic-regularized Newton for matrix vector spaces. We demonstrate the applicability of our approach by conducting numerical experiments on both synthetic and real datasets. In our experiments, we formulate a new model for estimating fair and robust covariance matrices in the spirit of the Tyler's M-estimator (TME) model and demonstrate its advantage.

LGFeb 11

Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation

Deyi Kong, Zaiwei Chen, Shuzhong Zhang et al.

In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian inverse--we exploit the statistical structure of the inner optimization problem and use the empirical Fisher information matrix as an asymptotically consistent surrogate for the Hessian. This design enables a parallel optimize-and-approximate framework in which the Hessian-inverse approximation is updated synchronously with the stochastic inner optimization, reusing gradient information at negligible additional cost. Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD that match those of state-of-the-art optimize-then-approximate methods, while significantly reducing computational time overhead. Empirical evaluations on representative bilevel learning tasks further demonstrate the practical advantages of NHGD, highlighting its scalability and effectiveness in large-scale machine learning settings.

LGJun 29, 2020

Binary Random Projections with Controllable Sparsity Patterns

Wenye Li, Shuzhong Zhang

Random projection is often used to project higher-dimensional vectors onto a lower-dimensional space, while approximately preserving their pairwise distances. It has emerged as a powerful tool in various data processing tasks and has attracted considerable research interest. Partly motivated by the recent discoveries in neuroscience, in this paper we study the problem of random projection using binary matrices with controllable sparsity patterns. Specifically, we proposed two sparse binary projection models that work on general data vectors. Compared with the conventional random projection models with dense projection matrices, our proposed models enjoy significant computational advantages due to their sparsity structure, as well as improved accuracies in empirical evaluations.

CVFeb 6, 2018

Highly accurate model for prediction of lung nodule malignancy with CT scans

Jason Causey, Junyu Zhang, Shiqian Ma et al.

Computed tomography (CT) examinations are commonly used to predict lung nodule malignancy in patients, which are shown to improve noninvasive early diagnosis of lung cancer. It remains challenging for computational approaches to achieve performance comparable to experienced radiologists. Here we present NoduleX, a systematic approach to predict lung nodule malignancy from CT data, based on deep learning convolutional neural networks (CNN). For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort. All nodules were identified and classified by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of ~0.99. This is commensurate with the analysis of the dataset by experienced radiologists. Our approach, NoduleX, provides an effective framework for highly accurate nodule malignancy prediction with the model trained on a large patient population. Our results are replicable with software available at http://bioinformatics.astate.edu/NoduleX.

OCOct 5, 2017

Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis

Junyu Zhang, Shiqian Ma, Shuzhong Zhang

In this paper we study nonconvex and nonsmooth multi-block optimization over Riemannian manifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. We develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of $ε$-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms enjoy an iteration complexity of $O(1/ε^2)$ to reach an $ε$-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the $\ell_q$ regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.

OCFeb 17, 2017

Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Yangyang Xu, Shuzhong Zhang

Block Coordinate Update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal-dual BCU method for solving linearly constrained convex program in multi-block variables. The method is an accelerated version of a primal-dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an $O(1/t)$ convergence rate under weak convexity assumption. We show that the rate can be accelerated to $O(1/t^2)$ if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.

OCMay 19, 2016

Randomized Primal-Dual Proximal Block Coordinate Updates

Xiang Gao, Yangyang Xu, Shuzhong Zhang

In this paper we propose a randomized primal-dual proximal block coordinate updating framework for a general multi-block convex optimization model with coupled objective function and linear constraints. Assuming mere convexity, we establish its $O(1/t)$ convergence rate in terms of the objective value and feasibility measure. The framework includes several existing algorithms as special cases such as a primal-dual method for bilinear saddle-point problems (PD-S), the proximal Jacobian ADMM (Prox-JADMM) and a randomized variant of the ADMM method for multi-block convex optimization. Our analysis recovers and/or strengthens the convergence properties of several existing algorithms. For example, for PD-S our result leads to the same order of convergence rate without the previously assumed boundedness condition on the constraint sets, and for Prox-JADMM the new result provides convergence rate in terms of the objective value and the feasibility violation. It is well known that the original ADMM may fail to converge when the number of blocks exceeds two. Our result shows that if an appropriate randomization procedure is invoked to select the updating blocks, then a sublinear rate of convergence in expectation can be guaranteed for multi-block ADMM, without assuming any strong convexity. The new approach is also extended to solve problems where only a stochastic approximation of the (sub-)gradient of the objective is available, and we establish an $O(1/\sqrt{t})$ convergence rate of the extended approach for solving stochastic programming.

OCMay 9, 2016

Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis

Bo Jiang, Tianyi Lin, Shiqian Ma et al.

Nonconvex and nonsmooth optimization problems are frequently encountered in much of statistics, business, science and engineering, but they are not yet widely recognized as a technology in the sense of scalability. A reason for this relatively low degree of popularity is the lack of a well developed system of theory and algorithms to support the applications, as is the case for its convex counterpart. This paper aims to take one step in the direction of disciplined nonconvex and nonsmooth optimization. In particular, we consider in this paper some constrained nonconvex optimization models in block decision variables, with or without coupled affine constraints. In the case of without coupled constraints, we show a sublinear rate of convergence to an $ε$-stationary solution in the form of variational inequality for a generalized conditional gradient method, where the convergence rate is shown to be dependent on the Hölderian continuity of the gradient of the smooth part of the objective. For the model with coupled affine constraints, we introduce corresponding $ε$-stationarity conditions, and apply two proximal-type variants of the ADMM to solve such a model, assuming the proximal ADMM updates can be implemented for all the block variables except for the last block, for which either a gradient step or a majorization-minimization step is implemented. We show an iteration complexity bound of $O(1/ε^2)$ to reach an $ε$-stationary solution for both algorithms. Moreover, we show that the same iteration complexity of a proximal BCD method follows immediately. Numerical results are provided to illustrate the efficacy of the proposed algorithms for tensor robust PCA.

OCMay 16, 2015

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

Tianyi Lin, Shiqian Ma, Shuzhong Zhang

The alternating direction method of multipliers (ADMM) has been successfully applied to solve structured convex optimization problems due to its superior practical performance. The convergence properties of the 2-block ADMM have been studied extensively in the literature. Specifically, it has been proven that the 2-block ADMM globally converges for any penalty parameter $γ>0$. In this sense, the 2-block ADMM allows the parameter to be free, i.e., there is no need to restrict the value for the parameter when implementing this algorithm in order to ensure convergence. However, for the 3-block ADMM, Chen \etal \cite{Chen-admm-failure-2013} recently constructed a counter-example showing that it can diverge if no further condition is imposed. The existing results on studying further sufficient conditions on guaranteeing the convergence of the 3-block ADMM usually require $γ$ to be smaller than a certain bound, which is usually either difficult to compute or too small to make it a practical algorithm. In this paper, we show that the 3-block ADMM still globally converges with any penalty parameter $γ>0$ if the third function $f_3$ in the objective is smooth and strongly convex, and its condition number is in $[1,1.0798)$, besides some other mild conditions. This requirement covers an important class of problems to be called regularized least squares decomposition (RLSD) in this paper.

OCJan 27, 2013

An Extragradient-Based Alternating Direction Method for Convex Minimization

Tianyi Lin, Shiqian Ma, Shuzhong Zhang

In this paper, we consider the problem of minimizing the sum of two convex functions subject to linear linking constraints. The classical alternating direction type methods usually assume that the two convex functions have relatively easy proximal mappings. However, many problems arising from statistics, image processing and other fields have the structure that while one of the two functions has easy proximal mapping, the other function is smoothly convex but does not have an easy proximal mapping. Therefore, the classical alternating direction methods cannot be applied. To deal with the difficulty, we propose in this paper an alternating direction method based on extragradients. Under the assumption that the smooth function has a Lipschitz continuous gradient, we prove that the proposed method returns an $ε$-optimal solution within $O(1/ε)$ iterations. We apply the proposed method to solve a new statistical model called fused logistic regression. Our numerical experiments show that the proposed method performs very well when solving the test problems. We also test the performance of the proposed method through solving the lasso problem arising from statistics and compare the result with several existing efficient solvers for this problem; the results are very encouraging indeed.

OCJan 2, 2007

Semidefnite Relaxation Bounds for Indefinite Homogeneous Quadratic Optimization

Simai He, Zhi-Quan Luo, Jiawang Nie et al.

In this paper we study the relationship between the optimal value of a homogeneous quadratic optimization problem and that of its Semidefinite Programming (SDP) relaxation. We consider two quadratic optimization models: (1) $\min \{x^* C x \mid x^* A_k x \ge 1, x\in\mathbb{F}^n, k=0,1,...,m\}$; and (2) $\max \{x^* C x \mid x^* A_k x \le 1, x\in\mathbb{F}^n, k=0,1,...,m\}$. If \emph{one} of $A_k$'s is indefinite while others and $C$ are positive semidefinite, we prove that the ratio between the optimal value of (1) and its SDP relaxation is upper bounded by $O(m^2)$ when $\mathbb{F}$ is the real line $\mathbb{R}$, and by $O(m)$ when $\mathbb{F}$ is the complex plane $\mathbb{C}$. This result is an extension of the recent work of Luo {\em et al.} \cite{LSTZ}. For (2), we show that the same ratio is bounded from below by $O(1/\log m)$ for both the real and complex case, whenever all but one of $A_k$'s are positive semidefinite while $C$ can be indefinite. This result improves the so-called approximate S-Lemma of Ben-Tal {\em et al.} \cite{BNR02}. We also consider (2) with multiple indefinite quadratic constraints and derive a general bound in terms of the problem data and the SDP solution. Throughout the paper, we present examples showing that all of our results are essentially tight.