Necdet Serhat Aybat

OC
h-index22
8papers
199citations
Novelty45%
AI Score35

8 Papers

OCAug 25, 2011
A First-order Augmented Lagrangian Method for Compressed Sensing

Necdet Serhat Aybat, Garud Iyengar

We propose a first-order augmented Lagrangian algorithm (FAL) for solving the basis pursuit problem. FAL computes a solution to this problem by inexactly solving a sequence of L1-regularized least squares sub-problems. These sub-problems are solved using an infinite memory proximal gradient algorithm wherein each update reduces to "shrinkage" or constrained "shrinkage". We show that FAL converges to an optimal solution of the basis pursuit problem whenever the solution is unique, which is the case with very high probability for compressed sensing problems. We construct a parameter sequence such that the corresponding FAL iterates are eps-feasible and eps-optimal for all eps>0 within O(log(1/eps)) FAL iterations. Moreover, FAL requires at most O(1/eps) matrix-vector multiplications of the form Ax or A^Ty to compute an eps-feasible, eps-optimal solution. We show that FAL can be easily extended to solve the basis pursuit denoising problem when there is a non-trivial level of noise on the measurements. We report the results of numerical experiments comparing FAL with the state-of-the-art algorithms for both noisy and noiseless compressed sensing problems. A striking property of FAL that we observed in the numerical experiments with randomly generated instances when there is no measurement noise was that FAL always correctly identifies the support of the target signal without any thresholding or post-processing, for moderately small error tolerance values.

OCSep 17, 2025
Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds

Mert Gürbüzbalaban, Yasa Syed, Necdet Serhat Aybat

We study trade-offs between convergence rate and robustness to gradient errors in the context of first-order methods. Our focus is on generalized momentum methods (GMMs)--a broad class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent methods--for minimizing smooth strongly convex objectives. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness of these methods to gradient errors via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed form expressions for RSI in terms of solutions to 2x2 matrix Riccati equations, revealing a Pareto frontier between RSI and convergence rate over the choice of step-size and momentum parameters. We then prove a large-deviation principle for time-averaged suboptimality in the large iteration limit and show that the rate function is, up to a scaling, the convex conjugate of the RSI function. We further show that the rate function and RSI are linked to the $H_\infty$-norm--a measure of robustness to the worst-case deterministic gradient errors--so that stronger worst-case robustness (smaller $H_\infty$-norm) leads to sharper decay of the tail probabilities for the average suboptimality. Beyond quadratics, under potentially biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, yielding finite-time high-probability guarantees and non-asymptotic large-deviation bounds for the averaged iterates. In the case of smooth strongly convex functions, we also observe an analogous trade-off between RSI and convergence-rate bounds. To our knowledge, these are the first non-asymptotic guarantees for GMMs with biased gradients and the first risk-sensitive analysis of GMMs. Finally, we provide numerical experiments on a robust regression problem to illustrate our results.

OCFeb 19, 2022
A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm

Bugra Can, Mert Gurbuzbalaban, Necdet Serhat Aybat

In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y)$ where $f$ is $L$-smooth, $f(.,y)$ is $μ$-strongly convex for every $y$, and $f(x,.)$ is $μ$-strongly concave for every $x$. Such problems arise frequently in machine learning in the context of robust empirical risk minimization (ERM), e.g. $\textit{distributionally robust}$ ERM, where partial gradients are estimated using mini-batches of data points. Assuming we have access to an unbiased stochastic first-order oracle we consider the stochastic accelerated primal dual (SAPD) algorithm recently introduced in Zhang et al. [2021] for SCSC SP problems as a robust method against gradient noise. In particular, SAPD recovers the well-known stochastic gradient descent ascent (SGDA) as a special case when the momentum parameter is set to zero and can achieve an accelerated rate when the momentum parameter is properly tuned, i.e., improving the $κ\triangleq L/μ$ dependence from $κ^2$ for SGDA to $κ$. We propose efficient variance-reduction strategies for SAPD based on Richardson-Romberg extrapolation and show that our method improves upon SAPD both in practice and in theory.

OCJan 23, 2019
A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban et al.

We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's method with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds.

OCJun 9, 2018
Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants

Shiqian Ma, Necdet Serhat Aybat

Robust PCA has drawn significant attention in the last decade due to its success in numerous application domains, ranging from bio-informatics, statistics, and machine learning to image and video processing in computer vision. Robust PCA and its variants such as sparse PCA and stable PCA can be formulated as optimization problems with exploitable special structures. Many specialized efficient optimization methods have been proposed to solve robust PCA and related problems. In this paper we review existing optimization methods for solving convex and nonconvex relaxations/variants of robust PCA, discuss their advantages and disadvantages, and elaborate on their convergence behaviors. We also provide some insights for possible future research directions including new algorithmic frameworks that might be suitable for implementing on multi-processor setting to handle large-scale problems.

OCMay 27, 2018
Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions

Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban et al.

We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal value; hence, we define the robustness of an algorithm to noise as the asymptotic expected suboptimality of the iterate sequence to input noise power. For this robustness measure, we provide exact expressions for the quadratic case using tools from robust control theory and tight upper bounds for the smooth strongly convex case using Lyapunov functions certified through matrix inequalities. We use these characterizations within an optimization problem which selects parameters of each algorithm to achieve a particular trade-off between rate and robustness. Our results show that AG can achieve acceleration while being more robust to random gradient errors. This behavior is quite different than previously reported in the deterministic gradient noise setting. We also establish some connections between the robustness of an algorithm and how quickly it can converge back to the optimal solution if it is perturbed from the optimal point with deterministic noise. Our framework also leads to practical algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise.

MLMay 11, 2016
Generalized Sparse Precision Matrix Selection for Fitting Multivariate Gaussian Random Fields to Large Data Sets

Sam Davanloo Tajbakhsh, Necdet Serhat Aybat, Enrique del Castillo

We present a new method for estimating multivariate, second-order stationary Gaussian Random Field (GRF) models based on the Sparse Precision matrix Selection (SPS) algorithm, proposed by Davanloo et al. (2015) for estimating scalar GRF models. Theoretical convergence rates for the estimated between-response covariance matrix and for the estimated parameters of the underlying spatial correlation function are established. Numerical tests using simulated and real datasets validate our theoretical findings. Data segmentation is used to handle large data sets.

MLMay 21, 2014
On the Theoretical Guarantees for Parameter Estimation of Gaussian Random Field Models: A Sparse Precision Matrix Approach

Sam Davanloo Tajbakhsh, Necdet Serhat Aybat, Enrique Del Castillo

Iterative methods for fitting a Gaussian Random Field (GRF) model via maximum likelihood (ML) estimation requires solving a nonconvex optimization problem. The problem is aggravated for anisotropic GRFs where the number of covariance function parameters increases with the dimension. Even evaluation of the likelihood function requires $O(n^3)$ floating point operations, where $n$ denotes the number of data locations. In this paper, we propose a new two-stage procedure to estimate the parameters of second-order stationary GRFs. First, a convex likelihood problem regularized with a weighted $\ell_1$-norm, utilizing the available distance information between observation locations, is solved to fit a sparse precision (inverse covariance) matrix to the observed data. Second, the parameters of the covariance function are estimated by solving a least squares problem. Theoretical error bounds for the solutions of stage I and II problems are provided, and their tightness are investigated.