SYOct 22, 2017
CLOT Norm Minimization for Continuous Hands-off ControlNiharika Challapalli, Masaaki Nagahara, Mathukumalli Vidyasagar
In this paper, we consider hands-off control via minimization of the CLOT (Combined $L$-One and Two) norm. The maximum hands-off control is the $L^0$-optimal (or the sparsest) control among all feasible controls that are bounded by a specified value and transfer the state from a given initial state to the origin within a fixed time duration. In general, the maximum hands-off control is a bang-off-bang control taking values of $\pm 1$ and $0$. For many real applications, such discontinuity in the control is not desirable. To obtain a continuous but still relatively sparse control, we propose to use the CLOT norm, a convex combination of $L^1$ and $L^2$ norms. We show by numerical simulations that the CLOT control is continuous and much sparser (i.e. has longer time duration on which the control takes 0) than the conventional EN (elastic net) control, which is a convex combination of $L^1$ and squared $L^2$ norms. We also prove that the CLOT control is continuous in the sense that, if $O(h)$ denotes the sampling period, then the difference between successive values of the CLOT-optimal control is $O(\sqrt{h})$, which is a form of continuity. Also, the CLOT formulation is extended to encompass constraints on the state variable.
LGApr 3, 2023
A Tutorial Introduction to Reinforcement LearningMathukumalli Vidyasagar
In this paper, we present a brief survey of Reinforcement Learning (RL), with particular emphasis on Stochastic Approximation (SA) as a unifying theme. The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Stochastic Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.
SYNov 7, 2016
Continuous Hands-off Control by CLOT Norm MinimizationNiharika Challapalli, Masaaki Nagahara, Mathukumalli Vidyasagar
In this paper, we consider hands-off control via minimization of the CLOT (Combined L-One and Two) norm. The maximum hands-off control is the L0-optimal (or the sparsest) control among all feasible controls that are bounded by a specified value and transfer the state from a given initial state to the origin within a fixed time duration. In general, the maximum hands-off control is a bang-off-bang control taking values of +1, -1, and 0. For many real applications, such discontinuity in the control is not desirable. To obtain a continuous but still relatively sparse control, we propose to use the CLOT norm, a convex combination of L1 and L2 norms. We show by numerical simulation that the CLOT control is continuous and much sparser (i.e. has longer time duration on which the control takes 0) than the conventional EN (elastic net) control, which is a convex combination of L1 and squared L2 norms.
MESep 15, 2022
Estimating large causal polytrees from small samplesSourav Chatterjee, Mathukumalli Vidyasagar
We consider the problem of estimating a large causal polytree from a relatively small i.i.d. sample. This is motivated by the problem of determining causal structure when the number of variables is very large compared to the sample size, such as in gene regulatory networks. We give an algorithm that recovers the tree with high accuracy in such settings. The algorithm works under essentially no distributional or modeling assumptions other than some mild non-degeneracy conditions.
SYSep 7, 2011
A Metric Between Probability Distributions on Finite Sets of Different Cardinalities and Applications to Order ReductionMathukumalli Vidyasagar
With increasing use of digital control it is natural to view control inputs and outputs as stochastic processes assuming values over finite alphabets rather than in a Euclidean space. As control over networks becomes increasingly common, data compression by reducing the size of the input and output alphabets without losing the fidelity of representation becomes relevant. This requires us to define a notion of distance between two stochastic processes assuming values in distinct sets, possibly of different cardinalities. If the two processes are i.i.d., then the problem becomes one of defining a metric between two probability distributions over distinct finite sets of possibly different cardinalities. This is the problem addressed in the present paper. A metric is defined in terms of a joint distribution on the product of the two sets, which has the two given distributions as its marginals, and has minimum entropy. Computing the metric exactly turns out to be NP-hard. Therefore an efficient greedy algorithm is presented for finding an upper bound on the distance. This problem also turns out to be NP-hard, so again a greedy algorithm is constructed for finding a suboptimal reduced order approximation. Taken together, all the results presented here permit the approximation of an i.i.d. process over a set of large cardinality by another i.i.d. process over a set of smaller cardinality. In future work, attempts will be made to extend this work to Markov processes over finite sets.
33.2MLMar 15
Convergence of Two Time-Scale Stochastic Approximation: A Martingale ApproachMathukumalli Vidyasagar
In this paper, we analyze the two time-scale stochastic approximation (TTSSA) algorithm introduced in Borkar (1997) using a martingale approach. This approach leads to simple sufficient conditions for the iterations to be bounded almost surely, as well as estimates on the rate of convergence of the mean-squared error of the TTSSA algorithm to zero. Our theory is applicable to nonlinear equations, in contrast to many papers in the TTSSA literature which assume that the equations are linear. The convergence of TTSSA is proved in the "almost sure" sense, in contrast to earlier papers on TTSSA that establish convergence in distribution, convergence in the mean, and the like. Moreover, in this paper we establish different rates of convergence for the fast and the slow subsystems, perhaps for the first time. Finally, all of the above results to continue to hold in the case where the two measurement errors have nonzero conditional mean, and/or have conditional variances that grow without bound as the iterations proceed. This is in contrast to previous papers which assumed that the errors form a martingale difference sequence with uniformly bounded conditional variance. It is shown that when the measurement errors have zero conditional mean and the conditional variance remains bounded, the mean-squared error of the iterations converges to zero at a rate of $o(t^{-η})$ for all $η\in (0,1)$. This improves upon the rate of $O(t^{-2/3})$ proved in Doan (2023) (which is the best bound available to date). Our bound is virtually the same as the rate of $O(t^{-1})$ proved in Doan (2024), but for a Polyak-Ruppert averaged version of TTSSA, and not directly. Rates of convergence are also established for the case where the errors have nonzero conditional mean and/or unbounded conditional variance.
OCJun 13, 2025
Convergence of Momentum-Based Optimization Algorithms with Time-Varying ParametersMathukumalli Vidyasagar
In this paper, we present a unified algorithm for stochastic optimization that makes use of a "momentum" term; in other words, the stochastic gradient depends not only on the current true gradient of the objective function, but also on the true gradient at the previous iteration. Our formulation includes the Stochastic Heavy Ball (SHB) and the Stochastic Nesterov Accelerated Gradient (SNAG) algorithms as special cases. In addition, in our formulation, the momentum term is allowed to vary as a function of time (i.e., the iteration counter). The assumptions on the stochastic gradient are the most general in the literature, in that it can be biased, and have a conditional variance that grows in an unbounded fashion as a function of time. This last feature is crucial in order to make the theory applicable to "zero-order" methods, where the gradient is estimated using just two function evaluations. We present a set of sufficient conditions for the convergence of the unified algorithm. These conditions are natural generalizations of the familiar Robbins-Monro and Kiefer-Wolfowitz-Blum conditions for standard stochastic gradient descent. We also analyze another method from the literature for the SHB algorithm with a time-varying momentum parameter, and show that it is impracticable.
MLOct 8, 2019
New and Explicit Constructions of Unbalanced Ramanujan Bipartite GraphsShantanu Prasad Burnwal, Kaneenika Sinha, Mathukumalli Vidyasagar
The objectives of this article are three-fold. Firstly, we present for the first time explicit constructions of an infinite family of \textit{unbalanced} Ramanujan bigraphs. Secondly, we revisit some of the known methods for constructing Ramanujan graphs and discuss the computational work required in actually implementing the various construction methods. The third goal of this article is to address the following question: can we construct a bipartite Ramanujan graph with specified degrees, but with the restriction that the edge set of this graph must be distinct from a given set of "prohibited" edges? We provide an affirmative answer in many cases, as long as the set of prohibited edges is not too large.
MLAug 2, 2019
Deterministic Completion of Rectangular Matrices Using Asymmetric Ramanujan Graphs: Exact and Stable RecoveryShantanu Prasad Burnwal, Mathukumalli Vidyasagar
In this paper we study the matrix completion problem: Suppose $X \in {\mathbb R}^{n_r \times n_c}$ is unknown except for a known upper bound $r$ on its rank. By measuring a small number $m \ll n_r n_c$ of elements of $X$, is it possible to recover $X$ exactly with noise-free measurements, or to construct a good approximation of $X$ with noisy measurements? Existing solutions to these problems involve sampling the elements uniformly and at random, and can guarantee exact recovery of the unknown matrix only with high probability. In this paper, we present a \textit{deterministic} sampling method for matrix completion. We achieve this by choosing the sampling set as the edge set of an asymmetric Ramanujan bigraph, and constrained nuclear norm minimization is the recovery method. Specifically, we derive sufficient conditions under which the unknown matrix is completed exactly with noise-free measurements, and is approximately completed with noisy measurements, which we call "stable" completion. The conditions derived here are only sufficient and more restrictive than random sampling. To study how close they are to being necessary, we conducted numerical simulations on randomly generated low rank matrices, using the LPS families of Ramanujan graphs. These simulations demonstrate two facts: (i) In order to achieve exact completion, it appears sufficient to choose the degree $d$ of the Ramanujan graph to be $\geq 3r$. (ii) There is a "phase transition," whereby the likelihood of success suddenly drops from 100\% to 0\% if the rank is increased by just one or two beyond a critical value. The phase transition phenomenon is well-known and well-studied in vector recovery using $\ell_1$-norm minimization. However, it is less studied in matrix completion and nuclear norm minimization, and not much understood.
MLAug 9, 2018
Compressed Sensing Using Binary Matrices of Nearly Optimal DimensionsMahsa Lotfi, Mathukumalli Vidyasagar
In this paper, we study the problem of compressed sensing using binary measurement matrices and $\ell_1$-norm minimization (basis pursuit) as the recovery algorithm. We derive new upper and lower bounds on the number of measurements to achieve robust sparse recovery with binary matrices. We establish sufficient conditions for a column-regular binary matrix to satisfy the robust null space property (RNSP) and show that the associated sufficient conditions % sparsity bounds for robust sparse recovery obtained using the RNSP are better by a factor of $(3 \sqrt{3})/2 \approx 2.6$ compared to the sufficient conditions obtained using the restricted isometry property (RIP). Next we derive universal \textit{lower} bounds on the number of measurements that any binary matrix needs to have in order to satisfy the weaker sufficient condition based on the RNSP and show that bipartite graphs of girth six are optimal. Then we display two classes of binary matrices, namely parity check matrices of array codes and Euler squares, which have girth six and are nearly optimal in the sense of almost satisfying the lower bound. In principle, randomly generated Gaussian measurement matrices are "order-optimal". So we compare the phase transition behavior of the basis pursuit formulation using binary array codes and Gaussian matrices and show that (i) there is essentially no difference between the phase transition boundaries in the two cases and (ii) the CPU time of basis pursuit with binary matrices is hundreds of times faster than with Gaussian matrices and the storage requirements are less. Therefore it is suggested that binary matrices are a viable alternative to Gaussian matrices for compressed sensing using basis pursuit. \end{abstract}
MLOct 22, 2017
An Approach to One-Bit Compressed Sensing Based on Probably Approximately Correct Learning TheoryMehmet Eren Ahsen, Mathukumalli Vidyasagar
In this paper, the problem of one-bit compressed sensing (OBCS) is formulated as a problem in probably approximately correct (PAC) learning. It is shown that the Vapnik-Chervonenkis (VC-) dimension of the set of half-spaces in $\mathbb{R}^n$ generated by $k$-sparse vectors is bounded below by $k \lg (n/k)$ and above by $2k \lg (n/k)$, plus some round-off terms. By coupling this estimate with well-established results in PAC learning theory, we show that a consistent algorithm can recover a $k$-sparse vector with $O(k \lg (n/k))$ measurements, given only the signs of the measurement vector. This result holds for \textit{all} probability measures on $\mathbb{R}^n$. It is further shown that random sign-flipping errors result only in an increase in the constant in the $O(k \lg (n/k))$ estimate. Because constructing a consistent algorithm is not straight-forward, we present a heuristic based on the $\ell_1$-norm support vector machine, and illustrate that its computational performance is superior to a currently popular method.
ITAug 11, 2017
A Fast Noniterative Algorithm for Compressive Sensing Using Binary Measurement MatricesMahsa Lotfi, Mathukumalli Vidyasagar
In this paper we present a new algorithm for compressive sensing that makes use of binary measurement matrices and achieves exact recovery of ultra sparse vectors, in a single pass and without any iterations. Due to its noniterative nature, our algorithm is hundreds of times faster than $\ell_1$-norm minimization, and methods based on expander graphs, both of which require multiple iterations. Our algorithm can accommodate nearly sparse vectors, in which case it recovers index set of the largest components, and can also accommodate burst noise measurements. Compared to compressive sensing methods that are guaranteed to achieve exact recovery of all sparse vectors, our method requires fewer measurements However, methods that achieve statistical recovery, that is, recovery of almost all but not all sparse vectors, can require fewer measurements than our method.
MLJun 19, 2016
Tight Performance Bounds for Compressed Sensing With Conventional and Group SparsityShashank Ranjan, Mathukumalli Vidyasagar
In this paper, we study the problem of recovering a group sparse vector from a small number of linear measurements. In the past the common approach has been to use various "group sparsity-inducing" norms such as the Group LASSO norm for this purpose. By using the theory of convex relaxations, we show that it is also possible to use $\ell_1$-norm minimization for group sparse recovery. We introduce a new concept called group robust null space property (GRNSP), and show that, under suitable conditions, a group version of the restricted isometry property (GRIP) implies the GRNSP, and thus leads to group sparse recovery. When all groups are of equal size, our bounds are less conservative than known bounds. Moreover, our results apply even to situations where where the groups have different sizes. When specialized to conventional sparsity, our bounds reduce to one of the well-known "best possible" conditions for sparse recovery. This relationship between GRNSP and GRIP is new even for conventional sparsity, and substantially streamlines the proofs of some known results. Using this relationship, we derive bounds on the $\ell_p$-norm of the residual error vector for all $p \in [1,2]$, and not just when $p = 2$. When the measurement matrix consists of random samples of a sub-Gaussian random variable, we present bounds on the number of measurements, which are less conservative than currently known bounds.
MLOct 30, 2014
Two New Approaches to Compressed Sensing Exhibiting Both Robust Sparse Recovery and the Grouping EffectMehmet Eren Ahsen, Niharika Challapalli, Mathukumalli Vidyasagar
In this paper we introduce a new optimization formulation for sparse regression and compressed sensing, called CLOT (Combined L-One and Two), wherein the regularizer is a convex combination of the $\ell_1$- and $\ell_2$-norms. This formulation differs from the Elastic Net (EN) formulation, in which the regularizer is a convex combination of the $\ell_1$- and $\ell_2$-norm squared. It is shown that, in the context of compressed sensing, the EN formulation does not achieve robust recovery of sparse vectors, whereas the new CLOT formulation achieves robust recovery. Also, like EN but unlike LASSO, the CLOT formulation achieves the grouping effect, wherein coefficients of highly correlated columns of the measurement (or design) matrix are assigned roughly comparable values. It is already known LASSO does not have the grouping effect. Therefore the CLOT formulation combines the best features of both LASSO (robust sparse recovery) and EN (grouping effect). The CLOT formulation is a special case of another one called SGL (Sparse Group LASSO) which was introduced into the literature previously, but without any analysis of either the grouping effect or robust sparse recovery. It is shown here that SGL achieves robust sparse recovery, and also achieves a version of the grouping effect in that coefficients of highly correlated columns belonging to the same group of the measurement (or design) matrix are assigned roughly comparable values.
QMFeb 24, 2014
Machine Learning Methods in the Computational Biology of CancerMathukumalli Vidyasagar
The objectives of this "perspective" paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented, and is applied to predict the time to tumor recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.
MLJan 26, 2014
Near-Ideal Behavior of Compressed Sensing AlgorithmsMehmet Eren Ahsen, Mathukumalli Vidyasagar
In a recent paper, it is shown that the LASSO algorithm exhibits "near-ideal behavior," in the following sense: Suppose $y = Az + η$ where $A$ satisfies the restricted isometry property (RIP) with a sufficiently small constant, and $\Vert η\Vert_2 \leq ε$. Then minimizing $\Vert z \Vert_1$ subject to $\Vert y - Az \Vert_2 \leq ε$ leads to an estimate $\hat{x}$ whose error $\Vert \hat{x} - x \Vert_2$ is bounded by a universal constant times the error achieved by an "oracle" that knows the location of the nonzero components of $x$. In the world of optimization, the LASSO algorithm has been generalized in several directions such as the group LASSO, the sparse group LASSO, either without or with tree-structured overlapping groups, and most recently, the sorted LASSO. In this paper, it is shown that {\it any algorithm\/} exhibits near-ideal behavior in the above sense, provided only that (i) the norm used to define the sparsity index is "decomposable," (ii) the penalty norm that is minimized in an effort to enforce sparsity is "$γ$-decomposable," and (iii) a "compressibility condition" in terms of a group restricted isometry property is satisfied. Specifically, the group LASSO, and the sparse group LASSO (with some permissible overlap in the groups), as well as the sorted $\ell_1$-norm minimization all exhibit near-ideal behavior. Explicit bounds on the residual error are derived that contain previously known results as special cases.