Sohail Bahmani

ML
h-index2
15papers
406citations
Novelty51%
AI Score31

15 Papers

NAJun 26, 2012
A Unifying Analysis of Projected Gradient Descent for $\ell_p$-constrained Least Squares

Sohail Bahmani, Bhiksha Raj

In this paper we study the performance of the Projected Gradient Descent(PGD) algorithm for $\ell_{p}$-constrained least squares problems that arise in the framework of Compressed Sensing. Relying on the Restricted Isometry Property, we provide convergence guarantees for this algorithm for the entire range of $0\leq p\leq1$, that include and generalize the existing results for the Iterative Hard Thresholding algorithm and provide a new accuracy guarantee for the Iterative Soft Thresholding algorithm as special cases. Our results suggest that in this group of algorithms, as $p$ increases from zero to one, conditions required to guarantee accuracy become stricter and robustness to noise deteriorates.

MLNov 6, 2024
A Fundamental Accuracy--Robustness Trade-off in Regression and Classification

Sohail Bahmani

We derive a fundamental trade-off between standard and adversarial risk in a rather general situation that formalizes the following simple intuition: "If no (nearly) optimal predictor is smooth, adversarial robustness comes at the cost of accuracy." As a concrete example, we evaluate the derived trade-off in regression with polynomial ridge functions under mild regularity conditions. Generalizing our analysis of this example, we formulate a necessary condition under which adversarial robustness can be achieved without significant degradation of the accuracy. This necessary condition is expressed in terms of a quantity that resembles the Poincaré constant of the data distribution.

OCOct 28, 2021
Decentralized Feature-Distributed Optimization for Generalized Linear Models

Brighton Ancelin, Sohail Bahmani, Justin Romberg

We consider the "all-for-one" decentralized learning problem for generalized linear models. The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables. To solve the regularized empirical risk minimization in this distributed setting, we apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem. The primal and dual iterations are either in closed-form or reduce to coordinate-wise minimization of scalar convex functions. We establish convergence rates for the empirical risk minimization under two different assumptions on the loss function (Lipschitz and square root Lipschitz), and show how they depend on the characteristics of the design matrix and the Laplacian of the network.

MLMar 12, 2021
Max-Linear Regression by Convex Programming

Seonho Kim, Sohail Bahmani, Kiryung Lee

We consider the multivariate max-linear regression problem where the model parameters $\boldsymbolβ_{1},\dotsc,\boldsymbolβ_{k}\in\mathbb{R}^{p}$ need to be estimated from $n$ independent samples of the (noisy) observations $y = \max_{1\leq j \leq k} \boldsymbolβ_{j}^{\mathsf{T}} \boldsymbol{x} + \mathrm{noise}$. The max-linear model vastly generalizes the conventional linear model, and it can approximate any convex function to an arbitrary accuracy when the number of linear models $k$ is large enough. However, the inherent nonlinearity of the max-linear model renders the estimation of the regression parameters computationally challenging. Particularly, no estimator based on convex programming is known in the literature. We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem. Under the standard Gaussian observation setting, we present a non-asymptotic performance guarantee showing that the convex program recovers the parameters with high probability. When the $k$ linear components are equally likely to achieve the maximum, our result shows a sufficient number of noise-free observations for exact recovery scales as {$k^{4}p$} up to a logarithmic factor. { This sample complexity coincides with that by alternating minimization (Ghosh et al., {2021}). Moreover, the same sample complexity applies when the observations are corrupted with arbitrary deterministic noise. We provide empirical results that show that our method performs as our theoretical result predicts, and is competitive with the alternating minimization algorithm particularly in presence of multiplicative Bernoulli noise. Furthermore, we also show empirically that a recursive application of AR can significantly improve the estimation accuracy.}

STApr 6, 2020
Low-Rank Matrix Estimation From Rank-One Projections by Unlifted Convex Optimization

Sohail Bahmani, Kiryung Lee

We study an estimator with a convex formulation for recovery of low-rank matrices from rank-one projections. Using initial estimates of the factors of the target $d_1\times d_2$ matrix of rank-$r$, the estimator admits a practical subgradient method operating in a space of dimension $r(d_1+d_2)$. This property makes the estimator significantly more scalable than the convex estimators based on lifting and semidefinite programming. Furthermore, we present a streamlined analysis for exact recovery under the real Gaussian measurement model, as well as the partially derandomized measurement model by using the spherical $t$-design. We show that under both models the estimator succeeds, with high probability, if the number of measurements exceeds $r^2 (d_1+d_2)$ up to some logarithmic factors. This sample complexity improves on the existing results for nonconvex iterative algorithms.

MLAug 26, 2019
Convex Programming for Estimation in Nonlinear Recurrent Models

Sohail Bahmani, Justin Romberg

We propose a formulation for nonlinear recurrent models that includes simple parametric models of recurrent neural networks as a special case. The proposed formulation leads to a natural estimator in the form of a convex program. We provide a sample complexity for this estimator in the case of stable dynamics, where the nonlinear recursion has a certain contraction property, and under certain regularity conditions on the input distribution. We evaluate the performance of the estimator by simulation on synthetic data. These numerical experiments also suggest the extent at which the imposed theoretical assumptions may be relaxed.

MLJun 19, 2018
Estimation from Non-Linear Observations via Convex Programming with Application to Bilinear Regression

Sohail Bahmani

We propose a computationally efficient estimator, formulated as a convex program, for a broad class of non-linear regression problems that involve difference of convex (DC) non-linearities. The proposed method can be viewed as a significant extension of the "anchored regression" method formulated and analyzed in [10] for regression with convex non-linearities. Our main assumption, in addition to other mild statistical and computational assumptions, is availability of a certain approximation oracle for the average of the gradients of the observation functions at a ground truth. Under this assumption and using a PAC-Bayesian analysis we show that the proposed estimator produces an accurate estimate with high probability. As a concrete example, we study the proposed framework in the bilinear regression problem with Gaussian factors and quantify a sufficient sample complexity for exact recovery. Furthermore, we describe a computationally tractable scheme that provably produces the required approximation oracle in the considered bilinear regression problem.

LGFeb 17, 2017
Solving Equations of Random Convex Functions via Anchored Regression

Sohail Bahmani, Justin Romberg

We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.

ITOct 13, 2016
Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation

Sohail Bahmani, Justin Romberg

We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. Therefore, we avoid the prohibitive computational cost associated with "lifting" and semidefinite programming (SDP) in methods such as PhaseLift and compete with recently developed non-convex techniques for phase retrieval. We relax the quadratic equations for phaseless measurements to inequality constraints each of which representing a symmetric "slab". Through a simple convex program, our proposed estimator finds an extreme point of the intersection of these slabs that is best aligned with a given anchor vector. We characterize geometric conditions that certify success of the proposed estimator. Furthermore, using classic results in statistical learning theory, we show that for random measurements the geometric certificates hold with high probability at an optimal sample complexity. Phase transition of our estimator is evaluated through simulations. Our numerical experiments also suggest that the proposed method can solve phase retrieval problems with coded diffraction measurements as well.

ITOct 27, 2015
Efficient Compressive Phase Retrieval with Constrained Sensing Vectors

Sohail Bahmani, Justin Romberg

We propose a robust and efficient approach to the problem of compressive phase retrieval in which the goal is to reconstruct a sparse vector from the magnitude of a number of its linear measurements. The proposed framework relies on constrained sensing vectors and a two-stage reconstruction method that consists of two standard convex programs that are solved sequentially. In recent years, various methods are proposed for compressive phase retrieval, but they have suboptimal sample complexity or lack robustness guarantees. The main obstacle has been that there is no straightforward convex relaxations for the type of structure in the target. Given a set of underdetermined measurements, there is a standard framework for recovering a sparse matrix, and a standard framework for recovering a low-rank matrix. However, a general, efficient method for recovering a jointly sparse and low-rank matrix has remained elusive. Deviating from the models with generic measurements, in this paper we show that if the sensing vectors are chosen at random from an incoherent subspace, then the low-rank and sparse structures of the target signal can be effectively decoupled. We show that a recovery algorithm that consists of a low-rank recovery stage followed by a sparse recovery stage will produce an accurate estimate of the target when the number of measurements is $\mathsf{O}(k\,\log\frac{d}{k})$, where $k$ and $d$ denote the sparsity level and the dimension of the input signal. We also evaluate the algorithm through numerical simulation.

ITOct 8, 2015
Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices

Sohail Bahmani, Justin Romberg

We introduce a technique for estimating a structured covariance matrix from observations of a random vector which have been sketched. Each observed random vector $\boldsymbol{x}_t$ is reduced to a single number by taking its inner product against one of a number of pre-selected vector $\boldsymbol{a}_\ell$. These observations are used to form estimates of linear observations of the covariance matrix $\boldsymbol{\varSigma}$, which is assumed to be simultaneously sparse and low-rank. We show that if the sketching vectors $\boldsymbol{a}_\ell$ have a special structure, then we can use straightforward two-stage algorithm that exploits this structure. We show that the estimate is accurate when the number of sketches is proportional to the maximum of the rank times the number of significant rows/columns of $\boldsymbol{\varSigma}$. Moreover, our algorithm takes direct advantage of the low-rank structure of $\boldsymbol{\varSigma}$ by only manipulating matrices that are far smaller than the original covariance matrix.

ITSep 3, 2015
Compressive Deconvolution in Random Mask Imaging

Sohail Bahmani, Justin Romberg

We investigate the problem of reconstructing signals from a subsampled convolution of their modulated versions and a known filter. The problem is studied as applies to specific imaging systems relying on spatial phase modulation by randomly coded "masks." The diversity induced by the random masks is deemed to improve the conditioning of the deconvolution problem while maintaining sampling efficiency. We analyze a linear model of the system, where the joint effect of the spatial modulation, blurring, and spatial subsampling is represented by a measurement matrix. We provide a bound on the conditioning of this measurement matrix in terms of the number of masks, the dimension of the image, and certain characteristics of the blurring kernel and subsampling operator. The derived bound shows that stable deconvolution is possible with high probability even if the total number of (scalar) measurements is within a logarithmic factor of the image size. Furthermore, beyond a critical number of masks determined by the extent of blurring and subsampling, every additional mask improves the conditioning of the measurement matrix. We also consider a more interesting scenario where the target image is sparse. We show that under mild conditions on the blurring kernel, with high probability the measurement matrix is a restricted isometry when the number of masks is within a logarithmic factor of the sparsity of the image. Therefore, the image can be reconstructed using many sparse recovery algorithms such as the basis pursuit. The bound on the required number of masks is linear in sparsity of the image but it is logarithmic in its dimension. The bound provides a quantitative view of the effect of the blurring and subsampling on the required number of masks, which is critical for designing efficient imaging systems.

ITApr 6, 2015
Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation

Sohail Bahmani, Justin Romberg

In this paper we analyze the blind deconvolution of an image and an unknown blur in a coded imaging system. The measurements consist of subsampled convolution of an unknown blurring kernel with multiple random binary modulations (coded masks) of the image. To perform the deconvolution, we consider a standard lifting of the image and the blurring kernel that transforms the measurements into a set of linear equations of the matrix formed by their outer product. Any rank-one solution to this system of equation provides a valid pair of an image and a blur. We first express the necessary and sufficient conditions for the uniqueness of a rank-one solution under some additional assumptions (uniform subsampling and no limit on the number of coded masks). These conditions are special case of a previously established result regarding identifiability in the matrix completion problem. We also characterize a low-dimensional subspace model for the blur kernel that is sufficient to guarantee identifiability, including the interesting instance of "bandpass"` blur kernels. Next, assuming the bandpass model for the blur kernel, we show that the image and the blur kernel can be found using nuclear norm minimization. Our main results show that recovery is achieved (with high probability) when the number of masks is on the order of $μ\log^{2}L\,\log\frac{Le}μ\,\log\log\left(N+1\right)$ where $μ$ is the \emph{coherence} of the blur, $L$ is the dimension of the image, and $N$ is the number of measured samples per mask.

MLSep 7, 2012
Learning Model-Based Sparsity via Projected Gradient Descent

Sohail Bahmani, Petros T. Boufounos, Bhiksha Raj

Several convex formulation methods have been proposed previously for statistical estimation with structured sparsity as the prior. These methods often require a carefully tuned regularization parameter, often a cumbersome or heuristic exercise. Furthermore, the estimate that these methods produce might not belong to the desired sparsity model, albeit accurately approximating the true parameter. Therefore, greedy-type algorithms could often be more desirable in estimating structured-sparse parameters. So far, these greedy methods have mostly focused on linear statistical models. In this paper we study the projected gradient descent with non-convex structured-sparse parameter model as the constraint set. Should the cost function have a Stable Model-Restricted Hessian the algorithm produces an approximation for the desired minimizer. As an example we elaborate on application of the main results to estimation in Generalized Linear Model.

MLMar 25, 2012
Greedy Sparsity-Constrained Optimization

Sohail Bahmani, Bhiksha Raj, Petros Boufounos

Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and compressive Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsity-constrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressive Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic data, where the algorithm is employed for sparse logistic regression with and without $\ell_2$-regularization.