Babak Hassibi

LG
h-index5
58papers
1,066citations
Novelty56%
AI Score57

58 Papers

ITMar 31, 2008
On the reconstruction of block-sparse signals with an optimal number of measurements

Mihailo Stojnic, Farzad Parvaresh, Babak Hassibi

Let A be an M by N matrix (M < N) which is an instance of a real random Gaussian ensemble. In compressed sensing we are interested in finding the sparsest solution to the system of equations A x = y for a given y. In general, whenever the sparsity of x is smaller than half the dimension of y then with overwhelming probability over A the sparsest solution is unique and can be found by an exhaustive search over x with an exponential time complexity for any y. The recent work of Candés, Donoho, and Tao shows that minimization of the L_1 norm of x subject to A x = y results in the sparsest solution provided the sparsity of x, say K, is smaller than a certain threshold for a given number of measurements. Specifically, if the dimension of y approaches the dimension of x, the sparsity of x should be K < 0.239 N. Here, we consider the case where x is d-block sparse, i.e., x consists of n = N / d blocks where each block is either a zero vector or a nonzero vector. Instead of L_1-norm relaxation, we consider the following relaxation min x \| X_1 \|_2 + \| X_2 \|_2 + ... + \| X_n \|_2, subject to A x = y where X_i = (x_{(i-1)d+1}, x_{(i-1)d+2}, ..., x_{i d}) for i = 1,2, ..., N. Our main result is that as n -> \infty, the minimization finds the sparsest solution to Ax = y, with overwhelming probability in A, for any x whose block sparsity is k/n < 1/2 - O(ε), provided M/N > 1 - 1/d, and d = Ω(\log(1/ε)/ε). The relaxation can be solved in polynomial time using semi-definite programming.

ITNov 20, 2018
Rate-cost tradeoffs in control

Victoria Kostina, Babak Hassibi

Consider a control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is to minimize a quadratic cost function in the state variables and control signal, known as the linear quadratic regulator (LQR). We study the fundamental tradeoff between the communication rate $r$ bits/sec and the expected cost $b$. We obtain a lower bound on a certain rate-cost function, which quantifies the minimum directed mutual information between the channel input and output that is compatible with a target LQR cost. The rate-cost function has operational significance in multiple scenarios of interest: among others, it allows us to lower-bound the minimum communication rate for fixed and variable length quantization, and for control over noisy channels. We derive an explicit lower bound to the rate-cost function, which applies to the vector, non-Gaussian, and partially observed systems, thereby extending and generalizing an earlier explicit expression for the scalar Gaussian system, due to Tatikonda el al. The bound applies as long as the differential entropy of the system noise is not $-\infty$. It can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state. Via a separation principle between control and communication, similar results hold for causal lossy compression of additive noise Markov sources. Apart from standard dynamic programming arguments, our technical approach leverages the Shannon lower bound, develops new estimates for data compression with coding memory, and uses some recent results on high resolution variable-length vector quantization to prove that the new converse bounds are tight.

OCAug 15, 2014
Optimal Placement of Distributed Energy Storage in Power Networks

Christos Thrampoulidis, Subhonmesh Bose, Babak Hassibi

We formulate the optimal placement, sizing and control of storage devices in a power network to minimize generation costs with the intent of load shifting. We assume deterministic demand, a linearized DC approximated power flow model and a fixed available storage budget. Our main result proves that when the generation costs are convex and nondecreasing, there always exists an optimal storage capacity allocation that places zero storage at generation-only buses that connect to the rest of the network via single links. This holds regardless of the demand profiles, generation capacities, line-flow limits and characteristics of the storage technologies. Through a counterexample, we illustrate that this result is not generally true for generation buses with multiple connections. For specific network topologies, we also characterize the dependence of the optimal generation cost on the available storage budget, generation capacities and flow constraints.

OCNov 9, 2011
A Simplified Approach to Recovery Conditions for Low Rank Matrices

Samet Oymak, Karthik Mohan, Maryam Fazel et al.

Recovering sparse vectors and low-rank matrices from noisy linear measurements has been the focus of much recent research. Various reconstruction algorithms have been studied, including $\ell_1$ and nuclear norm minimization as well as $\ell_p$ minimization with $p<1$. These algorithms are known to succeed if certain conditions on the measurement map are satisfied. Proofs of robust recovery for matrices have so far been much more involved than in the vector case. In this paper, we show how several robust classes of recovery conditions can be extended from vectors to matrices in a simple and transparent way, leading to the best known restricted isometry and nullspace conditions for matrix recovery. Our results rely on the ability to "vectorize" matrices through the use of a key singular value inequality.

ITJun 1, 2011
Linear Error Correcting Codes with Anytime Reliability

Ravi Teja Sukhavasi, Babak Hassibi

We consider rate R = k/n causal linear codes that map a sequence of k-dimensional binary vectors {b_t} to a sequence of n-dimensional binary vectors {c_t}, such that each c_t is a function of {b_1,b_2,...,b_t}. Such a code is called anytime reliable, for a particular binary-input memoryless channel, if at each time, probability of making an error about a source bit that was sent d time instants ago decays exponentially in d. Anytime reliable codes are useful in interactive communication problems and, in particular, can be used to stabilize unstable plants across noisy channels. Schulman proved the existence of such codes which, due to their structure, he called tree codes; however, to date, no explicit constructions and tractable decoding algorithms have been devised. In this paper, we show the existence of anytime reliable "linear" codes with "high probability", i.e., suitably chosen random linear causal codes are anytime reliable with high probability. The key is to consider time-invariant codes (i.e., ones with Toeplitz generator and parity check matrices) which obviates the need to union bound over all times. For the binary erasure channel we give a simple ML decoding algorithm whose average complexity is constant per time iteration and for which the probability that complexity at a given time t exceeds KC^3 decays exponentially in C. We show the efficacy of the method by simulating the stabilization of an unstable plant across a BEC, and remark on the tradeoffs between the utilization of the communication resources and the control performance.

MLFeb 13, 2023
Precise Asymptotic Analysis of Deep Random Feature Models

David Bosch, Ashkan Panahi, Babak Hassibi

We provide exact asymptotic expressions for the performance of regression by an $L-$layer deep random feature (RF) model, where the input is mapped through multiple random embedding and non-linear activation functions. For this purpose, we establish two key steps: First, we prove a novel universality result for RF models and deterministic data, by which we demonstrate that a deep random feature model is equivalent to a deep linear Gaussian model that matches it in the first and second moments, at each layer. Second, we make use of the convex Gaussian Min-Max theorem multiple times to obtain the exact behavior of deep RF models. We further characterize the variation of the eigendistribution in different layers of the equivalent Gaussian model, demonstrating that depth has a tangible effect on model performance despite the fact that only the last layer of the model is being trained.

ITJun 8, 2016
(Almost) Practical Tree Codes

Anatoly Khina, Wael Halbawi, Babak Hassibi

We consider the problem of stabilizing an unstable plant driven by bounded noise over a digital noisy communication link, a scenario at the heart of networked control. To stabilize such a plant, one needs real-time encoding and decoding with an error probability profile that decays exponentially with the decoding delay. The works of Schulman and Sahai over the past two decades have developed the notions of tree codes and anytime capacity, and provided the theoretical framework for studying such problems. Nonetheless, there has been little practical progress in this area due to the absence of explicit constructions of tree codes with efficient encoding and decoding algorithms. Recently, linear time-invariant tree codes were proposed to achieve the desired result under maximum-likelihood decoding. In this work, we take one more step towards practicality, by showing that these codes can be efficiently decoded using sequential decoding algorithms, up to some loss in performance (and with some practical complexity caveats). We supplement our theoretical results with numerical simulations that demonstrate the effectiveness of the decoder in a control system setting.

LGJun 17, 2022
Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control

Taylan Kargin, Sahin Lale, Kamyar Azizzadenesheli et al.

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that $\tilde O(\sqrt{T})$ frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains $\tilde O(\sqrt{T})$ regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.

SYMar 23, 2011
Anytime Reliable Codes for Stabilizing Plants over Erasure Channels

Ravi Teja Sukhavasi, Babak Hassibi

The problem of stabilizing an unstable plant over a noisy communication link is an increasingly important one that arises in problems of distributed control and networked control systems. Although the work of Schulman and Sahai over the past two decades, and their development of the notions of "tree codes" and "anytime capacity", provides the theoretical framework for studying such problems, there has been scant practical progress in this area because explicit constructions of tree codes with efficient encoding and decoding did not exist. To stabilize an unstable plant driven by bounded noise over a noisy channel one needs real-time encoding and real-time decoding and a reliability which increases exponentially with delay, which is what tree codes guarantee. We prove the existence of linear tree codes with high probability and, for erasure channels, give an explicit construction with an expected encoding and decoding complexity that is constant per time instant. We give sufficient conditions on the rate and reliability required of the tree codes to stabilize vector plants and argue that they are asymptotically tight. This work takes a major step towards controlling plants over noisy channels, and we demonstrate the efficacy of the method through several examples.

SYOct 28, 2016
Multi-Rate Control over AWGN Channels via Analog Joint Source-Channel Coding

Anatoly Khina, Gustav M. Pettersson, Victoria Kostina et al.

We consider the problem of controlling an unstable plant over an additive white Gaussian noise (AWGN) channel with a transmit power constraint, where the signaling rate of communication is larger than the sampling rate (for generating observations and applying control inputs) of the underlying plant. Such a situation is quite common since sampling is done at a rate that captures the dynamics of the plant and which is often much lower than the rate that can be communicated. This setting offers the opportunity of improving the system performance by employing multiple channel uses to convey a single message (output plant observation or control input). Common ways of doing so are through either repeating the message, or by quantizing it to a number of bits and then transmitting a channel coded version of the bits whose length is commensurate with the number of channel uses per sampled message. We argue that such "separated source and channel coding" can be suboptimal and propose to perform joint source-channel coding. Since the block length is short we obviate the need to go to the digital domain altogether and instead consider analog joint source-channel coding. For the case where the communication signaling rate is twice the sampling rate, we employ the Archimedean bi-spiral-based Shannon-Kotel'nikov analog maps to show significant improvement in stability margins and linear-quadratic Gaussian (LQG) costs over simple schemes that employ repetition.

ITOct 19, 2017
Rate-cost tradeoffs in control. Part II: achievable scheme

Victoria Kostina, Babak Hassibi

Consider a distributed control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is to minimize a quadratic cost function in the state variables and control signal, known as the linear quadratic regulator (LQR). We study the fundamental tradeoff between the communication rate r bits/sec and the limsup of the expected cost b. In the companion paper, which can be read independently of the current one, we show a lower bound on a certain cost function, which quantifies the minimum mutual information between the channel input and output, given the past, that is compatible with a target LQR cost. The bound applies as long as the system noise has a probability density function, and it holds for a general class of codes that can take full advantage of the memory of the data observed so far and that are not constrained to have any particular structure. In this paper, we prove that the bound can be approached by a simple variable-length lattice quantization scheme, as long as the system noise satisfies a smoothness condition. The quantization scheme only quantizes the innovation, that is, the difference between the controller's belief about the current state and the encoder's state estimate. Our proof technique leverages some recent results on nonasymptotic high resolution vector quantization.

LGOct 27, 2022
Stochastic Mirror Descent in Average Ensemble Models

Taylan Kargin, Fariborz Salehi, Babak Hassibi

The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm. In this paper we explore the performance of the SMD iterates on mean-field ensemble models. Our results generalize earlier ones obtained for SGD on such models. The evolution of the distribution of parameters is mapped to a continuous time process in the space of probability distributions. Our main result gives a nonlinear partial differential equation to which the continuous time process converges in the asymptotic regime of large networks. The impact of the mirror potential appears through a multiplicative term that is equal to the inverse of its Hessian and which can be interpreted as defining a gradient flow over an appropriately defined Riemannian manifold. We provide numerical simulations which allow us to study and characterize the effect of the mirror potential on the performance of networks trained with SMD for some binary classification problems.

LGFeb 18, 2023
The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models

Danil Akhtiamov, Babak Hassibi

Despite being highly over-parametrized, and having the ability to fully interpolate the training data, deep networks are known to generalize well to unseen data. It is now understood that part of the reason for this is that the training algorithms used have certain implicit regularization properties that ensure interpolating solutions with "good" properties are found. This is best understood in linear over-parametrized models where it has been shown that the celebrated stochastic gradient descent (SGD) algorithm finds an interpolating solution that is closest in Euclidean distance to the initial weight vector. Different regularizers, replacing Euclidean distance with Bregman divergence, can be obtained if we replace SGD with stochastic mirror descent (SMD). Empirical observations have shown that in the deep network setting, SMD achieves a generalization performance that is different from that of SGD (and which depends on the choice of SMD's potential function. In an attempt to begin to understand this behavior, we obtain the generalization error of SMD for over-parametrized linear models for a binary classification problem where the two classes are drawn from a Gaussian mixture model. We present simulation results that validate the theory and, in particular, introduce two data models, one for which SMD with an $\ell_2$ regularizer (i.e., SGD) outperforms SMD with an $\ell_1$ regularizer, and one for which the reverse happens.

OCJun 3, 2022
Optimal Competitive-Ratio Control

Oron Sabag, Sahin Lale, Babak Hassibi

Inspired by competitive policy designs approaches in online learning, new control paradigms such as competitive-ratio and regret-optimal control have been recently proposed as alternatives to the classical $\mathcal{H}_2$ and $\mathcal{H}_\infty$ approaches. These competitive metrics compare the control cost of the designed controller against the cost of a clairvoyant controller, which has access to past, present, and future disturbances in terms of ratio and difference, respectively. While prior work provided the optimal solution for the regret-optimal control problem, in competitive-ratio control, the solution is only provided for the sub-optimal problem. In this work, we derive the optimal solution to the competitive-ratio control problem. We show that the optimal competitive ratio formula can be computed as the maximal eigenvalue of a simple matrix, and provide a state-space controller that achieves the optimal competitive ratio. We conduct an extensive numerical study to verify this analytical solution, and demonstrate that the optimal competitive-ratio controller outperforms other controllers on several large scale practical systems. The key techniques that underpin our explicit solution is a reduction of the control problem to a Nehari problem, along with a novel factorization of the clairvoyant controller's cost. We reveal an interesting relation between the explicit solutions that now exist for both competitive control paradigms by formulating a regret-optimal control framework with weight functions that can also be utilized for practical purposes.

MLMar 11
Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

Reza Ghane, Danil Akhtiamov, Babak Hassibi

In this work we study the convergence properties of the Dual Space Preconditioned Gradient Descent, encompassing optimizers such as Normalized Gradient Descent, Gradient Clipping and Adam. We consider preconditioners of the form $\nabla K$, where $K: \mathbb{R}^p \to \mathbb{R}$ is convex and assume that the latter is applied to train an over-parameterized linear model with loss of the form $\ell({X} {W} - {Y})$, for weights ${W} \in \mathbb{R}^{d \times k}$, labels ${Y} \in \mathbb{R}^{n \times k}$ and data ${X} \in \mathbb{R}^{n \times d}$. Under the aforementioned assumptions, we prove that the iterates of the preconditioned gradient descent always converge to a point ${W}_{\infty} \in \mathbb{R}^{d \times k}$ satisfying ${X}{W}_{\infty} = {Y}$. Our proof techniques are of independent interest as we introduce a novel version of the Bregman Divergence with accompanying identities that allow us to establish convergence. We also study the implicit bias of Dual Space Preconditioned Gradient Descent. First, we demonstrate empirically that, for general $K(\cdot)$, ${W}_\infty$ depends on the chosen learning rate, hindering a precise characterization of the implicit bias. Then, for preconditioners of the form $K({G}) = h(\|{G}\|_F)$, known as \textit{isotropic preconditioners}, we show that ${W}_\infty$ minimizes $\|{W}_\infty - {W}_0\|_F^2$ subject to ${X}{W}_\infty = {Y}$, where ${W}_0$ is the initialization. Denoting the convergence point of GD initialized at ${W}_0$ by ${W}_{\text{GD}, \infty}$, we thus note ${W}_{\infty} = {W}_{\text{GD}, \infty}$ for isotropic preconditioners. Finally, we show that a similar fact holds for general preconditioners up to a multiplicative constant, namely, $\|{W}_0 - {W}_{\infty}\|_F \le c \|{W}_0 - {W}_{\text{GD}, \infty}\|_F$ for a constant $c>0$.

LGNov 3, 2023
Regularized Linear Regression for Binary Classification

Danil Akhtiamov, Reza Ghane, Babak Hassibi

Regularized linear regression is a promising approach for binary classification problems in which the training set has noisy labels since the regularization term can help to avoid interpolating the mislabeled data points. In this paper we provide a systematic study of the effects of the regularization strength on the performance of linear classifiers that are trained to solve binary classification problems by minimizing a regularized least-squares objective. We consider the over-parametrized regime and assume that the classes are generated from a Gaussian Mixture Model (GMM) where a fraction $c<\frac{1}{2}$ of the training data is mislabeled. Under these assumptions, we rigorously analyze the classification errors resulting from the application of ridge, $\ell_1$, and $\ell_\infty$ regression. In particular, we demonstrate that ridge regression invariably improves the classification error. We prove that $\ell_1$ regularization induces sparsity and observe that in many cases one can sparsify the solution by up to two orders of magnitude without any considerable loss of performance, even though the GMM has no underlying sparsity structure. For $\ell_\infty$ regularization we show that, for large enough regularization strength, the optimal weights concentrate around two values of opposite sign. We observe that in many cases the corresponding "compression" of each weight to a single bit leads to very little loss in performance. These latter observations can have significant practical ramifications.

SYApr 13
Quantized Online LQR

Barron Han, Victoria Kostina, Babak Hassibi

We study online linear-quadratic regulation (LQR) with unknown dynamics under communication rate constraints. Classical networked control quantizes the plant state at every time step, requiring $O(T)$ total bits while injecting persistent quantization noise that limits control performance. We consider a setting where the plant observes its state locally and can estimate system dynamics via ordinary least squares, while a remote controller possesses knowledge of the control cost. Rather than quantizing the raw state, the plant transmits learned dynamics estimates over a rate-limited uplink, and the controller returns the optimal control policy so that the plant can compute actions locally using its superior state knowledge. We first prove a fundamental information-theoretic lower bound: any scheme achieving $O(T^α)$ regret for $α\in [1/2,1)$ compared to the optimal infinite horizon LQR controller that knows the true system dynamics must transmit at least $Ω(\log T)$ bits. We then design the \textbf{Quantized Certainty Equivalent (QCE-LQR)} algorithm, which matches this bound. The resulting regret bound contains inflation factors $Q_{\mathrm{slow}}(\varrho)$ and $Q_{\mathrm{fast}}(\varrho)$ that vanish as the codebook resolution increases, smoothly recovering the unquantized baseline regret. Numerical experiments on four benchmark systems -- from a scalar unstable plant to a 24-parameter Boeing 747 lateral model -- confirm that a variant of QCE-LQR achieves regret comparable to an unquantized certainty equivalent controller over a horizon of $T=10{,}000$ steps.

LGApr 13
Distributionally Robust K-Means Clustering

Vikrant Malik, Taylan Kargin, Babak Hassibi

K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a distributionally robust variant that protects against such pathologies. We posit that the unknown population distribution lies within a Wasserstein-2 ball around the empirical distribution. In this setting, one seeks cluster centers that minimize the worst-case expected squared distance over this ambiguity set, leading to a minimax formulation. A tractable dual yields a soft-clustering scheme that replaces hard assignments with smoothly weighted ones. We propose an efficient block coordinate descent algorithm with provable monotonic decrease and local linear convergence. Experiments on standard benchmarks and large-scale synthetic data demonstrate substantial gains in outlier detection and robustness to noise.

MLFeb 22
Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

Danil Akhtiamov, Reza Ghane, Babak Hassibi

We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove that SMD with matrix mirror functions $ψ(\cdot)$ converges exponentially to a global interpolator. Furthermore, we generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by $ψ(\cdot)$ from initialization subject to interpolating the data. These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.

MLMar 19
Precise Performance of Linear Denoisers in the Proportional Regime

Reza Ghane, Danil Akhtiamov, Babak Hassibi

In the present paper we study the performance of linear denoisers for noisy data of the form $\mathbf{x} + \mathbf{z}$, where $\mathbf{x} \in \mathbb{R}^d$ is the desired data with zero mean and unknown covariance $\mathbfΣ$, and $\mathbf{z} \sim \mathcal{N}(0, \mathbfΣ_{\mathbf{z}})$ is additive noise. Since the covariance $\mathbfΣ$ is not known, the standard Wiener filter cannot be employed for denoising. Instead we assume we are given samples $\mathbf{x}_1,\dots,\mathbf{x}_n \in \mathbb{R}^d$ from the true distribution. A standard approach would then be to estimate $\mathbfΣ$ from the samples and use it to construct an ``empirical" Wiener filter. However, in this paper, motivated by the denoising step in diffusion models, we take a different approach whereby we train a linear denoiser $\mathbf{W}$ from the data itself. In particular, we synthetically construct noisy samples $\hat{\mathbf{x}}_i$ of the data by injecting the samples with Gaussian noise with covariance $\mathbfΣ_1 \neq \mathbfΣ_{\mathbf{z}}$ and find the best $\mathbf{W}$ that approximates $\mathbf{W}\hat{\mathbf{x}}_i \approx \mathbf{x}_i$ in a least-squares sense. In the proportional regime $\frac{n}{d} \rightarrow κ> 1$ we use the {\it Convex Gaussian Min-Max Theorem (CGMT)} to analytically find the closed form expression for the generalization error of the denoiser obtained from this process. Using this expression one can optimize over $\mathbfΣ_1$ to find the best possible denoiser. Our numerical simulations show that our denoiser outperforms the ``empirical" Wiener filter in many scenarios and approaches the optimal Wiener filter as $κ\rightarrow\infty$.

ITMay 9
On Codes with Support-Constrained Parity Checks

Barron Han, Hikmet Yildiz, Babak Hassibi

We study linear codes that maximize minimum distance subject to arbitrary support constraints on the parity-check matrix. Such constraints arise naturally in the design of LDPC codes, locally repairable codes, and hardware-constrained systems where each parity check must involve only a limited number of code symbols. They are also essential in quantum error correction, where sparse stabilizers reduce measurement noise and respect the connectivity constraints of physical qubit architectures. We derive the optimal minimum distance possible given support constraints on the parity-check matrix and show it is achievable over sufficiently large fields. When this maximum distance coincides with the Singleton bound for unconstrained parity check matrices, the dual GM-MDS construction yields generalized Reed--Solomon codes obeying the mask. In the generator-matrix setting, the GM-MDS theorem guarantees that the optimal distance can always be achieved by a subcode of a generalized Reed--Solomon code while satisfying arbitrary support constraints. We show that this is not true for the parity-check setting. We exhibit a set of support constraints, derived from the vertex-edge incidence of $K_{6,6}$, for which the optimal minimum distance cannot be realized by any subcode of a generalized Reed--Solomon code over any field. We also analyze structured constraint families -- regular, balanced, and cyclic masks -- through numerical optimization, providing design guidance for practical code constructions.

LGFeb 12, 2024
A Novel Gaussian Min-Max Theorem and its Applications

Danil Akhtiamov, David Bosch, Reza Ghane et al.

A celebrated result by Gordon allows one to compare the min-max behavior of two Gaussian processes if certain inequality conditions are met. The consequences of this result include the Gaussian min-max (GMT) and convex Gaussian min-max (CGMT) theorems which have had far-reaching implications in high-dimensional statistics, machine learning, non-smooth optimization, and signal processing. Both theorems rely on a pair of Gaussian processes, first identified by Slepian, that satisfy Gordon's comparison inequalities. In this paper, we identify such a new pair. The resulting theorems extend the classical GMT and CGMT Theorems from the case where the underlying Gaussian matrix in the primary process has iid rows to where it has independent but non-identically-distributed ones. The new CGMT is applied to the problems of multi-source Gaussian regression, as well as to binary classification of general Gaussian mixture models.

LGFeb 16, 2024
One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization

Reza Ghane, Danil Akhtiamov, Babak Hassibi

We study the use of linear regression for multiclass classification in the over-parametrized regime where some of the training data is mislabeled. In such scenarios it is necessary to add an explicit regularization term, $λf(w)$, for some convex function $f(\cdot)$, to avoid overfitting the mislabeled data. In our analysis, we assume that the data is sampled from a Gaussian Mixture Model with equal class sizes, and that a proportion $c$ of the training labels is corrupted for each class. Under these assumptions, we prove that the best classification performance is achieved when $f(\cdot) = \|\cdot\|^2_2$ and $λ\to \infty$. We then proceed to analyze the classification errors for $f(\cdot) = \|\cdot\|_1$ and $f(\cdot) = \|\cdot\|_\infty$ in the large $λ$ regime and notice that it is often possible to find sparse and one-bit solutions, respectively, that perform almost as well as the one corresponding to $f(\cdot) = \|\cdot\|_2^2$.

LGJun 20, 2025
Optimal Implicit Bias in Linear Regression

Kanumuri Nithin Varma, Babak Hassibi

Most modern learning problems are over-parameterized, where the number of learnable parameters is much greater than the number of training data points. In this over-parameterized regime, the training loss typically has infinitely many global optima that completely interpolate the data with varying generalization performance. The particular global optimum we converge to depends on the implicit bias of the optimization algorithm. The question we address in this paper is, ``What is the implicit bias that leads to the best generalization performance?". To find the optimal implicit bias, we provide a precise asymptotic analysis of the generalization performance of interpolators obtained from the minimization of convex functions/potentials for over-parameterized linear regression with non-isotropic Gaussian data. In particular, we obtain a tight lower bound on the best generalization error possible among this class of interpolators in terms of the over-parameterization ratio, the variance of the noise in the labels, the eigenspectrum of the data covariance, and the underlying distribution of the parameter to be estimated. Finally, we find the optimal convex implicit bias that achieves this lower bound under certain sufficient conditions involving the log-concavity of the distribution of a Gaussian convolved with the prior of the true underlying parameter.

LGOct 17, 2025
One-Bit Quantization for Random Features Models

Danil Akhtiamov, Reza Ghane, Babak Hassibi

Recent advances in neural networks have led to significant computational and memory demands, spurring interest in one-bit weight compression to enable efficient inference on resource-constrained devices. However, the theoretical underpinnings of such compression remain poorly understood. We address this gap by analyzing one-bit quantization in the Random Features model, a simplified framework that corresponds to neural networks with random representations. We prove that, asymptotically, quantizing weights of all layers except the last incurs no loss in generalization error, compared to the full precision random features model. Our findings offer theoretical insights into neural network compression. We also demonstrate empirically that one-bit quantization leads to significant inference speed ups for the Random Features models even on a laptop GPU, confirming the practical benefits of our work. Additionally, we provide an asymptotically precise characterization of the generalization error for Random Features with an arbitrary number of layers. To the best of our knowledge, our analysis yields more general results than all previous works in the related literature.

LGOct 16, 2025
Learn to Change the World: Multi-level Reinforcement Learning with Model-Changing Actions

Ziqing Lu, Babak Hassibi, Lifeng Lai et al.

Reinforcement learning usually assumes a given or sometimes even fixed environment in which an agent seeks an optimal policy to maximize its long-term discounted reward. In contrast, we consider agents that are not limited to passive adaptations: they instead have model-changing actions that actively modify the RL model of world dynamics itself. Reconfiguring the underlying transition processes can potentially increase the agents' rewards. Motivated by this setting, we introduce the multi-layer configurable time-varying Markov decision process (MCTVMDP). In an MCTVMDP, the lower-level MDP has a non-stationary transition function that is configurable through upper-level model-changing actions. The agent's objective consists of two parts: Optimize the configuration policies in the upper-level MDP and optimize the primitive action policies in the lower-level MDP to jointly improve its expected long-term reward.

MLJan 13, 2025
Gaussian Universality for Diffusion Models

Reza Ghane, Anthony Bao, Danil Akhtiamov et al.

We investigate Gaussian Universality for data distributions generated via diffusion models. By Gaussian Universality we mean that the test error of a generalized linear model $f(\mathbf{W})$ trained for a classification task on the diffusion data matches the test error of $f(\mathbf{W})$ trained on the Gaussian Mixture with matching means and covariances per class.In other words, the test error depends only on the first and second order statistics of the diffusion-generated data in the linear setting. As a corollary, the analysis of the test error for linear classifiers can be reduced to Gaussian data from diffusion-generated data. Analysing the performance of models trained on synthetic data is a pertinent problem due to the surge of methods such as \cite{sehwag2024stretchingdollardiffusiontraining}. Moreover, we show that, for any $1$- Lipschitz scalar function $φ$, $φ(\mathbf{x})$ is close to $\mathbb{E} φ(\mathbf{x})$ with high probability for $\mathbf{x}$ sampled from the conditional diffusion model corresponding to each class. Finally, we note that current approaches for proving universality do not apply to diffusion-generated data as the covariance matrices of the data tend to have vanishing minimum singular values, contrary to the assumption made in the literature. This leaves extending previous mathematical universality results as an intriguing open question.

LGFeb 22, 2022
Explicit Regularization via Regularizer Mirror Descent

Navid Azizan, Sahin Lale, Babak Hassibi

Despite perfectly interpolating the training data, deep neural networks (DNNs) can often generalize fairly well, in part due to the "implicit regularization" induced by the learning algorithm. Nonetheless, various forms of regularization, such as "explicit regularization" (via weight decay), are often used to avoid overfitting, especially when the data is corrupted. There are several challenges with explicit regularization, most notably unclear convergence properties. Inspired by convergence properties of stochastic mirror descent (SMD) algorithms, we propose a new method for training DNNs with regularization, called regularizer mirror descent (RMD). In highly overparameterized DNNs, SMD simultaneously interpolates the training data and minimizes a certain potential function of the weights. RMD starts with a standard cost which is the sum of the training loss and a convex regularizer of the weights. Reinterpreting this cost as the potential of an "augmented" overparameterized network and applying SMD yields RMD. As a result, RMD inherits the properties of SMD and provably converges to a point "close" to the minimizer of this cost. RMD is computationally comparable to stochastic gradient descent (SGD) and weight decay, and is parallelizable in the same manner. Our experimental results on training sets with various levels of corruption suggest that the generalization performance of RMD is remarkably robust and significantly better than both SGD and weight decay, which implicitly and explicitly regularize the $\ell_2$ norm of the weights. RMD can also be used to regularize the weights to a desired weight vector, which is particularly relevant for continual learning.

LGOct 24, 2021
Online estimation and control with optimal pathlength regret

Gautam Goel, Babak Hassibi

A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlength-optimal algorithms outperform traditional $H_2$ and $H_{\infty}$ algorithms when the environment varies over time.

LGOct 5, 2021
How to Query An Oracle? Efficient Strategies to Label Data

Farshad Lahouti, Victoria Kostina, Babak Hassibi

We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. This is typically an expensive and time consuming process and therefore, we seek ways to do so efficiently. The conventional approach involves comparing each sample with (the representative of) each class to find a match. In a setting with $N$ equally likely classes, this involves $N/2$ pairwise comparisons (queries per sample) on average. We consider a $k$-ary query scheme with $k\ge 2$ samples in a query that identifies (dis)similar items in the set while effectively exploiting the associated transitive relations. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(\frac{N}{k^2})$. In addition, we present an adaptive greedy query scheme, which achieves an average rate of $\approx 0.2N$ queries per sample with triplet queries. For the proposed algorithms, we investigate the query rate performance analytically and with simulations. Empirical studies suggest that each triplet query takes an expert at most 50\% more time compared with a pairwise query, indicating the effectiveness of the proposed $k$-ary query schemes. We generalize the analyses to nonuniform class distributions when possible.

LGAug 26, 2021
Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi et al.

Autoregressive exogenous (ARX) systems are the general class of input-output dynamical systems used for modeling stochastic linear dynamical systems (LDS) including partially observable LDS such as LQG systems. In this work, we study the problem of system identification and adaptive control of unknown ARX systems. We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection. Using these guarantees, we design adaptive control algorithms for unknown ARX systems with arbitrary strongly convex or convex quadratic regulating costs. Under strongly convex cost functions, we design an adaptive control algorithm based on online gradient descent to design and update the controllers that are constructed via a convex controller reparametrization. We show that our algorithm has $\tilde{\mathcal{O}}(\sqrt{T})$ regret via explore and commit approach and if the model estimates are updated in epochs using closed-loop data collection, it attains the optimal regret of $\text{polylog}(T)$ after $T$ time-steps of interaction. For the case of convex quadratic cost functions, we propose an adaptive control algorithm that deploys the optimism in the face of uncertainty principle to design the controller. In this setting, we show that the explore and commit approach has a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$, and the adaptive control with continuous model estimate updates attains $\tilde{\mathcal{O}}(\sqrt{T})$ regret after $T$ time-steps.

OCJul 28, 2021
Competitive Control

Gautam Goel, Babak Hassibi

We consider control from the perspective of competitive analysis. Unlike much prior work on learning-based control, which focuses on minimizing regret against the best controller selected in hindsight from some specific class, we focus on designing an online controller which competes against a clairvoyant offline optimal controller. A natural performance metric in this setting is competitive ratio, which is the ratio between the cost incurred by the online controller and the cost incurred by the offline optimal controller. Using operator-theoretic techniques from robust control, we derive a computationally efficient state-space description of the the controller with optimal competitive ratio in both finite-horizon and infinite-horizon settings. We extend competitive control to nonlinear systems using Model Predictive Control (MPC) and present numerical experiments which show that our competitive controller can significantly outperform standard $H_2$ and $H_{\infty}$ controllers in the MPC setting.

LGJun 22, 2021
Regret-optimal Estimation and Control

Gautam Goel, Babak Hassibi

We consider estimation and control in linear time-varying dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing causal estimators and controllers which compete against a clairvoyant noncausal policy, instead of the best policy selected in hindsight from some fixed parametric class. We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form using operator-theoretic techniques from robust control and present tight,data-dependent bounds on the regret incurred by our algorithms in terms of the energy of the disturbances. Our results can be viewed as extending traditional robust estimation and control, which focuses on minimizing worst-case cost, to minimizing worst-case regret. We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics and present numerical experiments which show that our regret-optimal algorithms can significantly outperform standard approaches to estimation and control.

OCMay 4, 2021
Regret-Optimal LQR Control

Oron Sabag, Gautam Goel, Sahin Lale et al.

We consider the infinite-horizon LQR control problem. Motivated by competitive analysis in online learning, as a criterion for controller design we introduce the dynamic regret, defined as the difference between the LQR cost of a causal controller (that has only access to past disturbances) and the LQR cost of the \emph{unique} clairvoyant one (that has also access to future disturbances) that is known to dominate all other controllers. The regret itself is a function of the disturbances, and we propose to find a causal controller that minimizes the worst-case regret over all bounded energy disturbances. The resulting controller has the interpretation of guaranteeing the smallest regret compared to the best non-causal controller that can see the future. We derive explicit formulas for the optimal regret and for the regret-optimal controller for the state-space setting. These explicit solutions are obtained by showing that the regret-optimal control problem can be reduced to a Nehari extension problem that can be solved explicitly. The regret-optimal controller is shown to be linear and can be expressed as the sum of the classical $H_2$ state-feedback law and an $n$-th order controller ($n$ is the state dimension), and its construction simply requires a solution to the standard LQR Riccati equation and two Lyapunov equations. Simulations over a range of plants demonstrate that the regret-optimal controller interpolates nicely between the $H_2$ and the $H_\infty$ optimal controllers, and generally has $H_2$ and $H_\infty$ costs that are simultaneously close to their optimal values. The regret-optimal controller thus presents itself as a viable option for control systems design.

OCJan 25, 2021
Regret-Optimal Filtering for Prediction and Estimation

Oron Sabag, Babak Hassibi

The filtering problem of causally estimating a desired signal from a related observation signal is investigated through the lens of regret optimization. Classical filter designs, such as $\mathcal H_2$ (Kalman) and $\mathcal H_\infty$, minimize the average and worst-case estimation errors, respectively. As a result $\mathcal H_2$ filters are sensitive to inaccuracies in the underlying statistical model, and $\mathcal H_\infty$ filters are overly conservative since they safeguard against the worst-case scenario. We propose instead to minimize the \emph{regret} in order to design filters that perform well in different noise regimes by comparing their performance with that of a clairvoyant filter. More explicitly, we minimize the largest deviation of the squared estimation error of a causal filter from that of a non-causal filter that has access to future observations. In this sense, the regret-optimal filter will have the best competitive performance with respect to the non-causal benchmark filter no matter what the true signal and the observation process are. For the important case of signals that can be described with a time-invariant state-space, we provide an explicit construction for the regret optimal filter in the estimation (causal) and the prediction (strictly-causal) regimes. These solutions are obtained by reducing the regret filtering problem to a Nehari problem, i.e., approximating a non-causal operator by a causal one in spectral norm. The regret-optimal filters bear some resemblance to Kalman and $H_\infty$ filters: they are expressed as state-space models, inherit the finite dimension of the original state-space, and their solutions require solving algebraic Riccati equations. Numerical simulations demonstrate that regret minimization inherently interpolates between the performances of the $H_2$ and $H_\infty$ filters and is thus a viable approach for filter design.

LGDec 8, 2020
Stability and Identification of Random Asynchronous Linear Time-Invariant Systems

Sahin Lale, Oguzhan Teke, Babak Hassibi et al.

In many computational tasks and dynamical systems, asynchrony and randomization are naturally present and have been considered as ways to increase the speed and reduce the cost of computation while compromising the accuracy and convergence rate. In this work, we show the additional benefits of randomization and asynchrony on the stability of linear dynamical systems. We introduce a natural model for random asynchronous linear time-invariant (LTI) systems which generalizes the standard (synchronous) LTI systems. In this model, each state variable is updated randomly and asynchronously with some probability according to the underlying system dynamics. We examine how the mean-square stability of random asynchronous LTI systems vary with respect to randomization and asynchrony. Surprisingly, we show that the stability of random asynchronous LTI systems does not imply or is not implied by the stability of the synchronous variant of the system and an unstable synchronous system can be stabilized via randomization and/or asynchrony. We further study a special case of the introduced model, namely randomized LTI systems, where each state element is updated randomly with some fixed but unknown probability. We consider the problem of system identification of unknown randomized LTI systems using the precise characterization of mean-square stability via extended Lyapunov equation. For unknown randomized LTI systems, we propose a systematic identification method to recover the underlying dynamics. Given a single input/output trajectory, our method estimates the model parameters that govern the system dynamics, the update probability of state variables, and the noise covariance using the correlation matrices of collected data and the extended Lyapunov equation. Finally, we empirically demonstrate that the proposed method consistently recovers the underlying system dynamics with the optimal rate.

SYNov 24, 2020
Regret-optimal measurement-feedback control

Gautam Goel, Babak Hassibi

We consider measurement-feedback control in linear dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing an online controller which competes with the optimal dynamic sequence of control actions selected in hindsight, instead of the best controller in some specific class of controllers. This formulation of regret is attractive when the environment changes over time and no single controller achieves good performance over the entire time horizon. We show that in the measurement-feedback setting, unlike in the full-information setting, there is no single offline controller which outperforms every other offline controller on every disturbance, and propose a new $H_2$-optimal offline controller as a benchmark for the online controller to compete against. We show that the corresponding regret-optimal online controller can be found via a novel reduction to the classical Nehari problem from robust control and present a tight data-dependent bound on its regret.

LGOct 29, 2020
Robustifying Binary Classification to Adversarial Perturbation

Fariborz Salehi, Babak Hassibi

Despite the enormous success of machine learning models in various applications, most of these models lack resilience to (even small) perturbations in their input data. Hence, new methods to robustify machine learning models seem very essential. To this end, in this paper we consider the problem of binary classification with adversarial perturbations. Investigating the solution to a min-max optimization (which considers the worst-case loss in the presence of adversarial perturbations) we introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data. We refer to this classifier as the "Robust Max-margin" (RM) classifier. Under some mild assumptions on the loss function, we theoretically show that the gradient descent iterates (with sufficiently small step size) converge to the RM classifier in its direction. Therefore, the RM classifier can be studied to compute various performance measures (e.g. generalization error) of binary classification with adversarial perturbations.

MLOct 29, 2020
The Performance Analysis of Generalized Margin Maximizer (GMM) on Separable Data

Fariborz Salehi, Ehsan Abbasi, Babak Hassibi

Logistic models are commonly used for binary classification tasks. The success of such models has often been attributed to their connection to maximum-likelihood estimators. It has been shown that gradient descent algorithm, when applied on the logistic loss, converges to the max-margin classifier (a.k.a. hard-margin SVM). The performance of the max-margin classifier has been recently analyzed. Inspired by these results, in this paper, we present and study a more general setting, where the underlying parameters of the logistic model possess certain structures (sparse, block-sparse, low-rank, etc.) and introduce a more general framework (which is referred to as "Generalized Margin Maximizer", GMM). While classical max-margin classifiers minimize the $2$-norm of the parameter vector subject to linearly separating the data, GMM minimizes any arbitrary convex function of the parameter vector. We provide a precise analysis of the performance of GMM via the solution of a system of nonlinear equations. We also provide a detailed study for three special cases: ($1$) $\ell_2$-GMM that is the max-margin classifier, ($2$) $\ell_1$-GMM which encourages sparsity, and ($3$) $\ell_{\infty}$-GMM which is often used when the parameter vector has binary entries. Our theoretical results are validated by extensive simulation results across a range of parameter values, problem instances, and model structures.

LGOct 20, 2020
Regret-optimal control in dynamic environments

Gautam Goel, Babak Hassibi

We consider control in linear time-varying dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing an online controller which minimizes regret against the best dynamic sequence of control actions selected in hindsight (dynamic regret), instead of the best fixed controller in some specific class of controllers (static regret). This formulation is attractive when the environment changes over time and no single controller achieves good performance over the entire time horizon. We derive the state-space structure of the regret-optimal controller via a novel reduction to $H_{\infty}$ control and present a tight data-dependent bound on its regret in terms of the energy of the disturbance. Our results easily extend to the model-predictive setting where the controller can anticipate future disturbances and to settings where the controller only affects the system dynamics after a fixed delay. We present numerical experiments which show that our regret-optimal controller interpolates between the performance of the $H_2$-optimal and $H_{\infty}$-optimal controllers across stochastic and adversarial environments.

LGJul 23, 2020
Reinforcement Learning with Fast Stabilization in Linear Dynamical Systems

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi et al.

In this work, we study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems. When learning a dynamical system, one needs to stabilize the unknown dynamics in order to avoid system blow-ups. We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment with an improved exploration strategy. We show that the proposed algorithm attains $\tilde{\mathcal{O}}(\sqrt{T})$ regret after $T$ time steps of agent-environment interaction. We also show that the regret of the proposed algorithm has only a polynomial dependence in the problem dimensions, which gives an exponential improvement over the prior methods. Our improved exploration method is simple, yet efficient, and it combines a sophisticated exploration policy in RL with an isotropic exploration strategy to achieve fast stabilization and improved regret. We empirically demonstrate that the proposed algorithm outperforms other popular methods in several adaptive control tasks.

LGMar 25, 2020
Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi et al.

We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (AdaptOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. AdaptOn estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, AdaptOn constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, AdaptOn improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that AdaptOn achieves a regret upper bound of $\text{polylog}\left(T\right)$, after $T$ time steps of agent-environment interaction. To the best of our knowledge, AdaptOn is the first algorithm that achieves $\text{polylog}\left(T\right)$ regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.

LGMar 12, 2020
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi et al.

We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt and prove the regret upper bound of $\tilde{\mathcal{O}}(\sqrt{T})$ for adaptive control of linear quadratic Gaussian (LQG) systems, where $T$ is the time horizon of the problem.

OCFeb 7, 2020
The Power of Linear Controllers in LQR Control

Gautam Goel, Babak Hassibi

The Linear Quadratic Regulator (LQR) framework considers the problem of regulating a linear dynamical system perturbed by environmental noise. We compute the policy regret between three distinct control policies: i) the optimal online policy, whose linear structure is given by the Ricatti equations; ii) the optimal offline linear policy, which is the best linear state feedback policy given the noise sequence; and iii) the optimal offline policy, which selects the globally optimal control actions given the noise sequence. We fully characterize the optimal offline policy and show that it has a recursive form in terms of the optimal online policy and future disturbances. We also show that cost of the optimal offline linear policy converges to the cost of the optimal online policy as the time horizon grows large, and consequently the optimal offline linear policy incurs linear regret relative to the optimal offline policy, even in the optimistic setting where the noise is drawn i.i.d from a known distribution. Although we focus on the setting where the noise is stochastic, our results also imply new lower bounds on the policy regret achievable when the noise is chosen by an adaptive adversary.

LGFeb 6, 2020
Differentially Quantized Gradient Methods

Chung-Yi Lin, Victoria Kostina, Babak Hassibi

Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{σ_{\mathrm{GD}}, ρ_n 2^{-R}\}$, where $σ_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $ρ_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 ρ_n /σ_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{σ_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $ρ_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.

LGJan 31, 2020
Regret Minimization in Partially Observable Linear Quadratic Control

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi et al.

We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$ for ExpCommit, where $T$ is the time horizon of the problem.

LGJun 10, 2019
Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Navid Azizan, Sahin Lale, Babak Hassibi

Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something that comes for free in the highly overparameterized case), SMD with sufficiently small step size converges to a global minimum that is approximately the closest one in Bregman divergence. On the experimental side, our extensive experiments on standard datasets and models, using various initializations, various mirror descents, and various Bregman divergences, consistently confirms that this phenomenon happens in deep learning. Our experiments further indicate that there is a clear difference in the generalization performance of the solutions obtained by different SMD algorithms. Experimenting on a standard image dataset and network architecture with SMD with different kinds of implicit regularization, $\ell_1$ to encourage sparsity, $\ell_2$ yielding SGD, and $\ell_{10}$ to discourage large components in the parameter vector, consistently and definitively shows that $\ell_{10}$-SMD has better generalization performance than SGD, which in turn has better generalization performance than $\ell_1$-SMD.

MLJun 10, 2019
The Impact of Regularization on High-dimensional Logistic Regression

Fariborz Salehi, Ehsan Abbasi, Babak Hassibi

Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist. We provide a precise analysis of the performance of RLR via the solution of a system of six nonlinear equations, through which any performance metric of interest (mean, mean-squared error, probability of support recovery, etc.) can be explicitly computed. Our results generalize those of Sur and Candes and we provide a detailed study for the cases of $\ell_2^2$-RLR and sparse ($\ell_1$-regularized) logistic regression. In both cases, we obtain explicit expressions for various performance metrics and can find the values of the regularizer parameter that optimizes the desired performance. The theory is validated by extensive numerical simulations across a range of parameter values and problem instances.

OCApr 3, 2019
A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

Navid Azizan, Babak Hassibi

Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a "mirror domain" defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way.

LGJan 28, 2019
Stochastic Linear Bandits with Hidden Low Rank Structure

Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar et al.

High-dimensional representations often have a lower dimensional underlying structure. This is particularly the case in many decision making settings. For example, when the representation of actions is generated from a deep neural network, it is reasonable to expect a low-rank structure whereas conventional structures like sparsity are not valid anymore. Subspace recovery methods, such as Principle Component Analysis (PCA) can find the underlying low-rank structures in the feature space and reduce the complexity of the learning tasks. In this work, we propose Projected Stochastic Linear Bandit (PSLB), an algorithm for high dimensional stochastic linear bandits (SLB) when the representation of actions has an underlying low-dimensional subspace structure. PSLB deploys PCA based projection to iteratively find the low rank structure in SLBs. We show that deploying projection methods assures dimensionality reduction and results in a tighter regret upper bound that is in terms of the dimensionality of the subspace and its properties, rather than the dimensionality of the ambient space. We modify the image classification task into the SLB setting and empirically show that, when a pre-trained DNN provides the high dimensional feature representations, deploying PSLB results in significant reduction of regret and faster convergence to an accurate model compared to state-of-art algorithm.