LGMay 30
Global Convergence of Adaptive Sensing for Principal Eigenvector EstimationAlex Saad-Falcon, Brighton Ancelin, Justin Romberg
Principal component analysis classically requires full $d$-dimensional samples, yet in various applications hardware limits acquisition to a few scalar measurements per sample. We analyze a compressed variant of Oja's algorithm for estimating the principal eigenvector of the data covariance matrix using only two adaptive measurements per sample. At each iteration, we observe one measurement along the current estimate and one in a random orthogonal direction. We prove that after $t$ iterations, the expected sine-squared error to the true eigenvector is $\mathcal{O}(λ_1λ_2 d^2 / (Δ^2 t))$, where $d$ is the ambient dimension, $λ_1, λ_2$ are the leading eigenvalues, and $Δ= λ_1 - λ_2$ is the eigengap. We complement this with a matching information-theoretic lower bound of $Ω(λ_1λ_2 d^2 / (Δ^2 t))$ -- the first for compressed eigenvector estimation -- proving that the $d^2$ factor, an additional factor of $d$ compared to the fully-observed minimax rate $Θ(λ_1λ_2 d / (Δ^2 t))$, is the fundamental cost of compression and cannot be improved. In contrast, any non-adaptive scheme with two measurements per iteration suffers $Ω(λ_2^2 d^3 / (Δ^2 t))$, an additional power of $d$. This separates fully-observed, adaptive-compressed, and non-adaptive-compressed PCA across three powers of $d$. Our analysis handles the noisy setting where the covariance has nonzero trailing eigenvalues, providing the first convergence guarantee for adaptive compressed subspace tracking beyond the noiseless case.
NAOct 2, 2017
Fast and guaranteed blind multichannel deconvolution under a bilinear system modelKiryung Lee, Ning Tian, Justin Romberg
We consider the multichannel blind deconvolution problem where we observe the output of multiple channels that are all excited with the same unknown input. From these observations, we wish to estimate the impulse responses of each of the channels. We show that this problem is well-posed if the channels follow a bilinear model where the ensemble of channel responses is modeled as lying in a low-dimensional subspace but with each channel modulated by an independent gain. Under this model, we show how the channel estimates can be found by minimizing a quadratic functional over a non-convex set. We analyze two methods for solving this non-convex program, and provide performance guarantees for each. The first is a method of alternating eigenvectors that breaks the program down into a series of eigenvalue problems. The second is a truncated power iteration, which can roughly be interpreted as a method for finding the largest eigenvector of a symmetric matrix with the additional constraint that it adheres to our bilinear model. As with most non-convex optimization algorithms, the performance of both of these algorithms is highly dependent on having a good starting point. We show how such a starting point can be constructed from the channel measurements. Our performance guarantees are non-asymptotic, and provide a sufficient condition on the number of samples observed per channel in order to guarantee channel estimates of a certain accuracy. Our analysis uses a model with a "generic" subspace that is drawn at random, and we show the performance bounds hold with high probability. Mathematically, the key estimates are derived by quantifying how well the eigenvectors of certain random matrices approximate the eigenvectors of their mean. We also present a series of numerical results demonstrating that the empirical performance is consistent with the presented theory.
NAFeb 22, 2010
Sparse Channel Separation using Random ProbesJustin Romberg, Ramesh Neelamani
This paper considers the problem of estimating the channel response (or Green's function) between multiple source-receiver pairs. Typically, the channel responses are estimated one-at-a-time: a single source sends out a known probe signal, the receiver measures the probe signal convolved with the channel response, and the responses are recovered using deconvolution. In this paper, we show that if the channel responses are sparse and the probe signals are random, then we can significantly reduce the total amount of time required to probe the channels by activating all of the sources simultaneously. With all sources activated simultaneously, the receiver measures a superposition of all the channel responses convolved with the respective probe signals. Separating this cumulative response into individual channel responses can be posed as a linear inverse problem. We show that channel response separation is possible (and stable) even when the probing signals are relatively short in spite of the corresponding linear system of equations becoming severely underdetermined. We derive a theoretical lower bound on the length of the source signals that guarantees that this separation is possible with high probability. The bound is derived by putting the problem in the context of finding a sparse solution to an underdetermined system of equations, and then using mathematical tools from the theory of compressive sensing. Finally, we discuss some practical applications of these results, which include forward modeling for seismic imaging, channel equalization in multiple-input multiple-output communication, and increasing the field-of-view in an imaging system by using coded apertures.
OCMay 27, 2022
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov GamesSihan Zeng, Thinh T. Doan, Justin Romberg
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.
LGApr 4, 2023
Decentralized and Privacy-Preserving Learning of Approximate Stackelberg Solutions in Energy Trading Games with Demand Response AggregatorsStyliani I. Kampezidou, Justin Romberg, Kyriakos G. Vamvoudakis et al.
In this work, a novel Stackelberg game theoretic framework is proposed for trading energy bidirectionally between the demand-response (DR) aggregator and the prosumers. This formulation allows for flexible energy arbitrage and additional monetary rewards while ensuring that the prosumers' desired daily energy demand is met. Then, a scalable (linear with the number of prosumers), decentralized, privacy-preserving algorithm is proposed to find approximate equilibria with online sampling and learning of the prosumers' cumulative best response, which finds applications beyond this energy game. Moreover, cost bounds are provided on the quality of the approximate equilibrium solution. Finally, real data from the California day-ahead market and the UC Davis campus building energy demands are utilized to demonstrate the efficacy of the proposed framework and algorithm.
NAFeb 28, 2013
Superfast Tikhonov Regularization of Toeplitz SystemsChristopher Turnes, Doru Balcan, Justin Romberg
Toeplitz-structured linear systems arise often in practical engineering problems. Correspondingly, a number of algorithms have been developed that exploit Toeplitz structure to gain computational efficiency when solving these systems. The earliest "fast" algorithms for Toeplitz systems required O(n^2) operations, while more recent "superfast" algorithms reduce the cost to O(n (log n)^2) or below. In this work, we present a superfast algorithm for Tikhonov regularization of Toeplitz systems. Using an "extension-and-transformation" technique, our algorithm translates a Tikhonov-regularized Toeplitz system into a type of specialized polynomial problem known as tangential interpolation. Under this formulation, we can compute the solution in only O(n (log n)^2) operations. We use numerical simulations to demonstrate our algorithm's complexity and verify that it returns stable solutions.
IVOct 10, 2022
Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem SolverPeimeng Guan, Jihui Jin, Justin Romberg et al.
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimization-based solver and then learning a regularizer from data. This loop unrolling (LU) method has shown tremendous success, but often requires a deep model for the best performance leading to high memory costs during training. Thus, to address the balance between computation cost and network expressiveness, we propose an LU algorithm with shallow equilibrium regularizers (LUSER). These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training. The proposed method is evaluated on image deblurring, computed tomography (CT), as well as single-coil Magnetic Resonance Imaging (MRI) tasks and shows similar, or even better, performance while requiring up to 8 times less computational resources during training when compared against a more typical LU architecture with feedforward convolutional regularizers.
LGMar 23, 2023
Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax TheoremsSihan Zeng, Thinh T. Doan, Justin Romberg
The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the optimization objective as a function of the policy parameter and reward satisfies a stronger "equiconnectedness" property. To our best knowledge, these are novel and previously unknown discoveries. We present an application of the connectedness of these superlevel sets to the derivation of minimax theorems for robust reinforcement learning. We show that any minimax optimization program which is convex on one side and is equiconnected on the other side observes the minimax equality (i.e. has a Nash equilibrium). We find that this exact structure is exhibited by an interesting robust reinforcement learning problem under an adversarial reward attack, and the validity of its minimax equality immediately follows. This is the first time such a result is established in the literature.
MLNov 5, 2025
A general technique for approximating high-dimensional empirical kernel matricesChiraag Kaushik, Justin Romberg, Vidya Muthukumar
We present simple, user-friendly bounds for the expected operator norm of a random kernel matrix under general conditions on the kernel function $k(\cdot,\cdot)$. Our approach uses decoupling results for U-statistics and the non-commutative Khintchine inequality to obtain upper and lower bounds depending only on scalar statistics of the kernel function and a ``correlation kernel'' matrix corresponding to $k(\cdot,\cdot)$. We then apply our method to provide new, tighter approximations for inner-product kernel matrices on general high-dimensional data, where the sample size and data dimension are polynomially related. Our method obtains simplified proofs of existing results that rely on the moment method and combinatorial arguments while also providing novel approximation results for the case of anisotropic Gaussian data. Finally, using similar techniques to our approximation result, we show a tighter lower bound on the bias of kernel regression with anisotropic Gaussian data.
CVMar 2
3D Field of Junctions: A Noise-Robust, Training-Free Structural Prior for Volumetric Inverse ProblemsNamhoon Kim, Narges Moeini, Justin Romberg et al.
Volume denoising is a foundational problem in computational imaging, as many 3D imaging inverse problems face high levels of measurement noise. Inspired by the strong 2D image denoising properties of Field of Junctions (ICCV 2021), we propose a novel, fully volumetric 3D Field of Junctions (3D FoJ) representation that optimizes a junction of 3D wedges that best explain each 3D patch of a full volume, while encouraging consistency between overlapping patches. In addition to direct volume denoising, we leverage our 3D FoJ representation as a structural prior that: (i) requires no training data, and thus precludes the risk of hallucination, (ii) preserves and enhances sharp edge and corner structures in 3D, even under low signal to noise ratio (SNR), and (iii) can be used as a drop-in denoising representation via projected or proximal gradient descent for any volumetric inverse problem with low SNR. We demonstrate successful volume reconstruction and denoising with 3D FoJ across three diverse 3D imaging tasks with low-SNR measurements: low-dose X-ray computed tomography (CT), cryogenic electron tomography (cryo-ET), and denoising point clouds such as those from lidar in adverse weather. Across these challenging low-SNR volumetric imaging problems, 3D FoJ outperforms a mixture of classical and neural methods.
LGMar 3, 2025
Accelerating Multi-Task Temporal Difference Learning under Low-Rank RepresentationYitao Bai, Sihan Zeng, Justin Romberg et al.
We study policy evaluation problems in multi-task reinforcement learning (RL) under a low-rank representation setting. In this setting, we are given $N$ learning tasks where the corresponding value function of these tasks lie in an $r$-dimensional subspace, with $r<N$. One can apply the classic temporal-difference (TD) learning method for solving these problems where this method learns the value function of each task independently. In this paper, we are interested in understanding whether one can exploit the low-rank structure of the multi-task setting to accelerate the performance of TD learning. To answer this question, we propose a new variant of TD learning method, where we integrate the so-called truncated singular value decomposition step into the update of TD learning. This additional step will enable TD learning to exploit the dominant directions due to the low rank structure to update the iterates, therefore, improving its performance. Our empirical results show that the proposed method significantly outperforms the classic TD learning, where the performance gap increases as the rank $r$ decreases. From the theoretical point of view, introducing the truncated singular value decomposition step into TD learning might cause an instability on the updates. We provide a theoretical result showing that the instability does not happen. Specifically, we prove that the proposed method converges at a rate $\mathcal{O}(\frac{\ln(t)}{t})$, where $t$ is the number of iterations. This rate matches that of the standard TD learning.
SYMar 31
Finite-Time Analysis of Projected Two-Time-Scale Stochastic ApproximationYitao Bai, Thinh T. Doan, Justin Romberg
We study the finite-time convergence of projected linear two-time-scale stochastic approximation with constant step sizes and Polyak--Ruppert averaging. We establish an explicit mean-square error bound, decomposing it into two interpretable components, an approximation error determined by the constrained subspace and a statistical error decaying at a sublinear rate, with constants expressed through restricted stability margins and a coupling invertibility condition. These constants cleanly separate the effect of subspace choice (approximation errors) from the effect of the averaging horizon (statistical errors). We illustrate our theoretical results through a number of numerical experiments on both synthetic and reinforcement learning problems.
SPOct 16, 2024
Radon Implicit Field Transform (RIFT): Learning Scenes from Radar SignalsDaqian Bao, Alex Saad-Falcon, Justin Romberg
Data acquisition in array signal processing (ASP) is costly because achieving high angular and range resolutions necessitates large antenna apertures and wide frequency bandwidths, respectively. The data requirements for ASP problems grow multiplicatively with the number of viewpoints and frequencies, significantly increasing the burden of data collection, even for simulation. Implicit Neural Representations (INRs) -- neural network-based models of 3D objects and scenes -- offer compact and continuous representations with minimal radar data. They can interpolate to unseen viewpoints and potentially address the sampling cost in ASP problems. In this work, we select Synthetic Aperture Radar (SAR) as a case from ASP and propose Radon Implicit Field Transform (RIFT). RIFT consists of two components: a classical forward model for radar (Generalized Radon Transform, GRT), and an INR based scene representation learned from radar signals. This method can be extended to other ASP problems by replacing the GRT with appropriate algorithms corresponding to different data modalities. In our experiments, we first synthesize radar data using the GRT. We then train the INR model on this synthetic data by minimizing the reconstruction error of the radar signal. After training, we render the scene using the trained INR and evaluate our scene representation against the ground truth scene. Due to the lack of existing benchmarks, we introduce two main new error metrics: phase-Root Mean Square Error (p-RMSE) for radar signal interpolation, and magnitude-Structural Similarity Index measure(m-SSIM) for scene reconstruction. These metrics adapt traditional error measures to account for the complex nature of radar signals. Compared to traditional scene models in radar signal processing, with only 10% data footprint, our RIFT model achieves up to 188% improvement in scene reconstruction.
NAOct 11, 2024
Rapid Grassmannian Averaging with Chebyshev PolynomialsBrighton Ancelin, Alex Saad-Falcon, Kason Ancelin et al.
We propose new algorithms to efficiently average a collection of points on a Grassmannian manifold in both the centralized and decentralized settings. Grassmannian points are used ubiquitously in machine learning, computer vision, and signal processing to represent data through (often low-dimensional) subspaces. While averaging these points is crucial to many tasks (especially in the decentralized setting), existing methods unfortunately remain computationally expensive due to the non-Euclidean geometry of the manifold. Our proposed algorithms, Rapid Grassmannian Averaging (RGrAv) and Decentralized Rapid Grassmannian Averaging (DRGrAv), overcome this challenge by leveraging the spectral structure of the problem to rapidly compute an average using only small matrix multiplications and QR factorizations. We provide a theoretical guarantee of optimality and present numerical experiments which demonstrate that our algorithms outperform state-of-the-art methods in providing high accuracy solutions in minimal time. Additional experiments showcase the versatility of our algorithms to tasks such as K-means clustering on video motion data, establishing RGrAv and DRGrAv as powerful tools for generic Grassmannian averaging.
MLJun 4, 2024
Precise asymptotics of reweighted least-squares algorithms for linear diagonal networksChiraag Kaushik, Justin Romberg, Vidya Muthukumar
The classical iteratively reweighted least-squares (IRLS) algorithm aims to recover an unknown signal from linear measurements by performing a sequence of weighted least squares problems, where the weights are recursively updated at each step. Varieties of this algorithm have been shown to achieve favorable empirical performance and theoretical guarantees for sparse recovery and $\ell_p$-norm minimization. Recently, some preliminary connections have also been made between IRLS and certain types of non-convex linear neural network architectures that are observed to exploit low-dimensional structure in high-dimensional linear models. In this work, we provide a unified asymptotic analysis for a family of algorithms that encompasses IRLS, the recently proposed lin-RFM algorithm (which was motivated by feature learning in neural networks), and the alternating minimization algorithm on linear diagonal neural networks. Our analysis operates in a "batched" setting with i.i.d. Gaussian covariates and shows that, with appropriately chosen reweighting policy, the algorithm can achieve favorable performance in only a handful of iterations. We also extend our results to the case of group-sparse recovery and show that leveraging this structure in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
OCMay 3, 2024
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement LearningSihan Zeng, Thinh T. Doan, Justin Romberg
Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where information for all tasks is accessible to a single server, and in the decentralized setting, where a network of agents, each given one task and observing local information, cooperate to find the solution of the globally constrained objective using local communication. We first propose a primal-dual algorithm that provably converges to the globally optimal solution of this constrained formulation under exact gradient evaluations. When the gradient is unknown, we further develop a sampled-based actor-critic algorithm that finds the optimal policy using online samples of state, action, and reward. Finally, we study the extension of the algorithm to the linear function approximation setting.
OCOct 28, 2021
Decentralized Feature-Distributed Optimization for Generalized Linear ModelsBrighton Ancelin, Sohail Bahmani, Justin Romberg
We consider the "all-for-one" decentralized learning problem for generalized linear models. The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables. To solve the regularized empirical risk minimization in this distributed setting, we apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem. The primal and dual iterations are either in closed-form or reduce to coordinate-wise minimization of scalar convex functions. We establish convergence rates for the empirical risk minimization under two different assumptions on the loss function (Lipschitz and square root Lipschitz), and show how they depend on the characteristics of the design matrix and the Laplacian of the network.
OCOct 21, 2021
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision ProcessesSihan Zeng, Thinh T. Doan, Justin Romberg
We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.
OCSep 29, 2021
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement LearningSihan Zeng, Thinh T. Doan, Justin Romberg
We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" than the latter. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, PL condition, and general non-convexity. We apply our framework to various policy optimization problems. First, we look at the infinite-horizon average-reward MDP with finite state and action spaces and derive a convergence rate of $O(k^{-2/5})$ for the online actor-critic algorithm under function approximation, which recovers the best known rate derived specifically for this problem. Second, we study the linear-quadratic regulator and show that an online actor-critic method converges with rate $O(k^{-2/3})$. Third, we use the actor-critic algorithm to solve the policy optimization problem in an entropy regularized Markov decision process, where we also establish a convergence of $O(k^{-2/3})$. The results we derive for both the second and third problem are novel and previously unknown in the literature. Finally, we briefly present the application of our framework to gradient-based policy evaluation algorithms in reinforcement learning.
LGJan 26, 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic AlgorithmSajad Khodadadian, Thinh T. Doan, Justin Romberg et al.
Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis applies to very general settings, as we only assume ergodicity of the underlying Markov decision process. In order to ensure enough exploration, we employ an $ε$-greedy sampling of the trajectory. For a fixed and small enough exploration parameter $ε$, we show that the two-time-scale natural actor-critic algorithm has a rate of convergence of $\tilde{\mathcal{O}}(1/T^{1/4})$, where $T$ is the number of samples, and this leads to a sample complexity of $\Tilde{\mathcal{O}}(1/δ^{8})$ samples to find a policy that is within an error of $δ$ from the \emph{global optimum}. Moreover, by carefully decreasing the exploration parameter $ε$ as the iterations proceed, we present an improved sample complexity of $\Tilde{\mathcal{O}}(1/δ^{6})$ for convergence to the global optimum.
LGOct 28, 2020
Finite-Time Convergence Rates of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task LearningSihan Zeng, Thinh T. Doan, Justin Romberg
We study a decentralized variant of stochastic approximation, a data-driven approach for finding the root of an operator under noisy measurements. A network of agents, each with its own operator and data observations, cooperatively find the fixed point of the aggregate operator over a decentralized communication graph. Our main contribution is to provide a finite-time analysis of this decentralized stochastic approximation method when the data observed at each agent are sampled from a Markov process; this lack of independence makes the iterates biased and (potentially) unbounded. Under fairly standard assumptions, we show that the convergence rate of the proposed method is essentially the same as if the samples were independent, differing only by a log factor that accounts for the mixing time of the Markov processes. The key idea in our analysis is to introduce a novel Razumikhin-Lyapunov function, motivated by the one used in analyzing the stability of delayed ordinary differential equations. We also discuss applications of the proposed method on a number of interesting learning problems in multi-agent systems.
LGJun 15, 2020
Fast Graph Attention Networks Using Effective Resistance Based Graph SparsificationRakshith S Srinivasa, Cao Xiao, Lucas Glass et al.
The attention mechanism has demonstrated superior performance for inference over nodes in graph neural networks (GNNs), however, they result in a high computational burden during both training and inference. We propose FastGAT, a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph. This results in a per-epoch time that is almost linear in the number of graph nodes as opposed to quadratic. We theoretically prove that spectral sparsification preserves the features computed by the GAT model, thereby justifying our algorithm. We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks under both inductive and transductive settings. FastGAT can dramatically reduce (up to \textbf{10x}) the computational time and memory requirements, allowing the usage of attention based GNNs on large graphs.
MLJun 13, 2020
Sample complexity and effective dimension for regression on manifoldsAndrew McRae, Justin Romberg, Mark Davenport
We consider the theory of regression on a manifold using reproducing kernel Hilbert space methods. Manifold models arise in a wide variety of modern machine learning problems, and our goal is to help understand the effectiveness of various implicit and explicit dimensionality-reduction methods that exploit manifold structure. Our first key contribution is to establish a novel nonasymptotic version of the Weyl law from differential geometry. From this we are able to show that certain spaces of smooth functions on a manifold are effectively finite-dimensional, with a complexity that scales according to the manifold dimension rather than any ambient data dimension. Finally, we show that given (potentially noisy) function values taken uniformly at random over a manifold, a kernel regression estimator (derived from the spectral decomposition of the manifold) yields minimax-optimal error bounds that are controlled by the effective dimension.
LGJun 8, 2020
A Decentralized Policy Gradient Approach to Multi-task Reinforcement LearningSihan Zeng, Aqeel Anwar, Thinh Doan et al.
We develop a mathematical framework for solving multi-task reinforcement learning (MTRL) problems based on a type of policy gradient method. The goal in MTRL is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state spaces, but have different rewards and dynamics. We highlight two fundamental challenges in MTRL that are not present in its single task counterpart, and illustrate them with simple examples. We then develop a decentralized entropy-regularized policy gradient method for solving the MTRL problem, and study its finite-time convergence rate. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "GridWorld" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to navigate an airborne drone in multiple (simulated) environments.
OCMar 24, 2020
Finite-Time Analysis of Stochastic Gradient Descent under Markov RandomnessThinh T. Doan, Lam M. Nguyen, Nhan H. Pham et al.
Motivated by broad applications in reinforcement learning and machine learning, this paper considers the popular stochastic gradient descent (SGD) when the gradients of the underlying objective function are sampled from Markov processes. This Markov sampling leads to the gradient samples being biased and not independent. The existing results for the convergence of SGD under Markov randomness are often established under the assumptions on the boundedness of either the iterates or the gradient samples. Our main focus is to study the finite-time convergence of SGD for different types of objective functions, without requiring these assumptions. We show that SGD converges nearly at the same rate with Markovian gradient samples as with independent gradient samples. The only difference is a logarithmic factor that accounts for the mixing time of the Markov chain.
MLMar 20, 2020
Localized sketching for matrix multiplication and ridge regressionRakshith S Srinivasa, Mark A Davenport, Justin Romberg
We consider sketched approximate matrix multiplication and ridge regression in the novel setting of localized sketching, where at any given point, only part of the data matrix is available. This corresponds to a block diagonal structure on the sketching matrix. We show that, under mild conditions, block diagonal sketching matrices require only O(stable rank / ε^2) and $O( stat. dim. ε)$ total sample complexity for matrix multiplication and ridge regression, respectively. This matches the state-of-the-art bounds that are obtained using global sketching matrices. The localized nature of sketching considered allows for different parts of the data matrix to be sketched independently and hence is more amenable to computation in distributed and streaming settings and results in a smaller memory and computational footprint.
LGNov 9, 2019
Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random IndicesForoozan Karimzadeh, Ningyuan Cao, Brian Crafton et al.
Deep neural networks (DNNs) have been emerged as the state-of-the-art algorithms in broad range of applications. To reduce the memory foot-print of DNNs, in particular for embedded applications, sparsification techniques have been proposed. Unfortunately, these techniques come with a large hardware overhead. In this paper, we present a hardware-aware pruning method where the locations of non-zero weights are derived in real-time from a Linear Feedback Shift Registers (LFSRs). Using the proposed method, we demonstrate a total saving of energy and area up to 63.96% and 64.23% for VGG-16 network on down-sampled ImageNet, respectively for iso-compression-rate and iso-accuracy.
ROSep 12, 2019
A Reinforcement Learning Framework for Sequencing Multi-Robot BehaviorsPietro Pierpaoli, Thinh T. Doan, Justin Romberg et al.
Given a list of behaviors and associated parameterized controllers for solving different individual tasks, we study the problem of selecting an optimal sequence of coordinated behaviors in multi-robot systems for completing a given mission, which could not be handled by any single behavior. In addition, we are interested in the case where partial information of the underlying mission is unknown, therefore, the robots must cooperatively learn this information through their course of actions. Such problem can be formulated as an optimal decision problem in multi-robot systems, however, it is in general intractable due to modeling imperfections and the curse of dimensionality of the decision variables. To circumvent these issues, we first consider an alternate formulation of the original problem through introducing a sequence of behaviors' switching times. Our main contribution is then to propose a novel reinforcement learning based method, that combines Q-learning and online gradient descent, for solving this reformulated problem. In particular, the optimal sequence of the robots' behaviors is found by using Q-learning while the optimal parameters of the associated controllers are obtained through an online gradient descent method. Finally, to illustrate the effectiveness of our proposed method we implement it on a team of differential-drive robots for solving two different missions, namely, convoy protection and object manipulation.
MLAug 26, 2019
Convex Programming for Estimation in Nonlinear Recurrent ModelsSohail Bahmani, Justin Romberg
We propose a formulation for nonlinear recurrent models that includes simple parametric models of recurrent neural networks as a special case. The proposed formulation leads to a natural estimator in the form of a convex program. We provide a sample complexity for this estimator in the case of stable dynamics, where the nonlinear recursion has a certain contraction property, and under certain regularity conditions on the input distribution. We evaluate the performance of the estimator by simulation on synthetic data. These numerical experiments also suggest the extent at which the imposed theoretical assumptions may be relaxed.
OCJul 25, 2019
Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function ApproximationThinh T. Doan, Siva Theja Maguluri, Justin Romberg
We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the value (global discounted accumulative reward) associated with each environmental state. Over a series of time steps, the agents act, get rewarded, update their local estimate of the value function, then communicate with their neighbors. The local update at each agent can be interpreted as a distributed variant of the popular temporal difference learning methods {\sf TD}$ (λ)$. Our main contribution is to provide a finite-analysis on the performance of this distributed {\sf TD}$(λ)$ algorithm for both constant and time-varying step sizes. The key idea in our analysis is to use the geometric mixing time $τ$ of the underlying Markov chain, that is, although the "noise" in our algorithm is Markovian, its dependence is very weak at samples spaced out at every $τ$. We provide an explicit upper bound on the convergence rate of the proposed method as a function of the network topology, the discount factor, the constant $λ$, and the mixing time $τ$. Our results also provide a mathematical explanation for observations that have appeared previously in the literature about the choice of $λ$. Our upper bound illustrates the trade-off between approximation accuracy and convergence speed implicit in the choice of $λ$. When $λ=1$, the solution will correspond to the best possible approximation of the value function, while choosing $λ= 0$ leads to faster convergence when the noise in the algorithm has large variance.
LGFeb 19, 2019
Fast Compressive Sensing Recovery Using Generative Models with Structured Latent VariablesShaojie Xu, Sihan Zeng, Justin Romberg
Deep learning models have significantly improved the visual quality and accuracy on compressive sensing recovery. In this paper, we propose an algorithm for signal reconstruction from compressed measurements with image priors captured by a generative model. We search and constrain on latent variable space to make the method stable when the number of compressed measurements is extremely limited. We show that, by exploiting certain structures of the latent variables, the proposed method produces improved reconstruction accuracy and preserves realistic and non-smooth features in the image. Our algorithm achieves high computation speed by projecting between the original signal space and the latent variable space in an alternating fashion.
CVFeb 19, 2019
Appearance-based Gesture recognition in the compressed domainShaojie Xu, Anvesha Amaravati, Justin Romberg et al.
We propose a novel appearance-based gesture recognition algorithm using compressed domain signal processing techniques. Gesture features are extracted directly from the compressed measurements, which are the block averages and the coded linear combinations of the image sensor's pixel values. We also improve both the computational efficiency and the memory requirement of the previous DTW-based K-NN gesture classifiers. Both simulation testing and hardware implementation strongly support the proposed algorithm.
LGJun 17, 2018
Fast Convex Pruning of Deep Neural NetworksAlireza Aghasi, Afshin Abdi, Justin Romberg
We develop a fast, tractable technique called Net-Trim for simplifying a trained neural network. The method is a convex post-processing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of Net-Trim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after Net-Trim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most $s$ terms that can re-create the layer outputs from the layer inputs, we can find these weights from $\mathcal{O}(s\log N/s)$ samples, where $N$ is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recently-developed tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of Net-Trim for network pruning and compression.
NAAug 10, 2017
The Fast Slepian TransformSanthosh Karnik, Zhihui Zhu, Michael B. Wakin et al.
The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time-frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods.
LGFeb 17, 2017
Solving Equations of Random Convex Functions via Anchored RegressionSohail Bahmani, Justin Romberg
We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.
LGNov 16, 2016
Net-Trim: Convex Pruning of Deep Neural Networks with Performance GuaranteeAlireza Aghasi, Afshin Abdi, Nam Nguyen et al.
We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length $N$ as inputs, we show that if the network response can be described using a maximum number of $s$ non-zero weights per node, these weights can be learned from $\mathcal{O}(s\log N)$ samples.
ITOct 13, 2016
Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex RelaxationSohail Bahmani, Justin Romberg
We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. Therefore, we avoid the prohibitive computational cost associated with "lifting" and semidefinite programming (SDP) in methods such as PhaseLift and compete with recently developed non-convex techniques for phase retrieval. We relax the quadratic equations for phaseless measurements to inequality constraints each of which representing a symmetric "slab". Through a simple convex program, our proposed estimator finds an extreme point of the intersection of these slabs that is best aligned with a given anchor vector. We characterize geometric conditions that certify success of the proposed estimator. Furthermore, using classic results in statistical learning theory, we show that for random measurements the geometric certificates hold with high probability at an optimal sample complexity. Phase transition of our estimator is evaluated through simulations. Our numerical experiments also suggest that the proposed method can solve phase retrieval problems with coded diffraction measurements as well.
CVMay 26, 2016
A Light-powered, Always-On, Smart Camera with Compressed Domain Gesture DetectionAnvesha A, Shaojie Xu, Ningyuan Cao et al.
In this paper we propose an energy-efficient camera-based gesture recognition system powered by light energy for "always on" applications. Low energy consumption is achieved by directly extracting gesture features from the compressed measurements, which are the block averages and the linear combinations of the image sensor's pixel values. The gestures are recognized using a nearest-neighbour (NN) classifier followed by Dynamic Time Warping (DTW). The system has been implemented on an Analog Devices Black Fin ULP vision processor and powered by PV cells whose output is regulated by TI's DC-DC buck converter with Maximum Power Point Tracking (MPPT). Measured data reveals that with only 400 compressed measurements (768x compression ratio) per frame, the system is able to recognize key wake-up gestures with greater than 80% accuracy and only 95mJ of energy per frame. Owing to its fully self-powered operation, the proposed system can find wide applications in "always-on" vision systems such as in surveillance, robotics and consumer electronics with touch-less operation.
CVMar 29, 2016
Sweep Distortion Removal from THz Images via Blind DemodulationAlireza Aghasi, Barmak Heshmat, Albert Redo-Sanchez et al.
Heavy sweep distortion induced by alignments and inter-reflections of layers of a sample is a major burden in recovering 2D and 3D information in time resolved spectral imaging. This problem cannot be addressed by conventional denoising and signal processing techniques as it heavily depends on the physics of the acquisition. Here we propose and implement an algorithmic framework based on low-rank matrix recovery and alternating minimization that exploits the forward model for THz acquisition. The method allows recovering the original signal in spite of the presence of temporal-spatial distortions. We address a blind-demodulation problem, where based on several observations of the sample texture modulated by an undesired sweep pattern, the two classes of signals are separated. The performance of the method is examined in both synthetic and experimental data, and the successful reconstructions are demonstrated. The proposed general scheme can be implemented to advance inspection and imaging applications in THz and other time-resolved sensing modalities.
CVFeb 23, 2016
Learning Shapes by Convex CompositionAlireza Aghasi, Justin Romberg
We present a mathematical and algorithmic scheme for learning the principal geometric elements in an image or 3D object. We build on recent work that convexifies the basic problem of finding a combination of a small number shapes that overlap and occlude one another in such a way that they "match" a given scene as closely as possible. This paper derives general sufficient conditions under which this convex shape composition identifies a target composition. From a computational standpoint, we present two different methods for solving the associated optimization programs. The first method simply recasts the problem as a linear program, while the second uses the alternating direction method of multipliers with a series of easily computed proximal operators.
ITJun 14, 2013
Sparse Recovery of Streaming Signals Using L1-HomotopyM. Salman Asif, Justin Romberg
Most of the existing methods for sparse signal recovery assume a static system: the unknown signal is a finite-length vector for which a fixed set of linear measurements and a sparse representation basis are available and an L1-norm minimization program is solved for the reconstruction. However, the same representation and reconstruction framework is not readily applicable in a streaming system: the unknown signal changes over time, and it is measured and reconstructed sequentially over small time intervals. In this paper, we discuss two such streaming systems and a homotopy-based algorithm for quickly solving the associated L1-norm minimization programs: 1) Recovery of a smooth, time-varying signal for which, instead of using block transforms, we use lapped orthogonal transforms for sparse representation. 2) Recovery of a sparse, time-varying signal that follows a linear dynamic model. For both the systems, we iteratively process measurements over a sliding interval and estimate sparse coefficients by solving a weighted L1-norm minimization program. Instead of solving a new L1 program from scratch at every iteration, we use an available signal estimate as a starting point in a homotopy formulation. Starting with a warm-start vector, our homotopy algorithm updates the solution in a small number of computationally inexpensive steps as the system changes. The homotopy algorithm presented in this paper is highly versatile as it can update the solution for the L1 problem in a number of dynamical settings. We demonstrate with numerical experiments that our proposed streaming recovery framework outperforms the methods that represent and reconstruct a signal as independent, disjoint blocks, in terms of quality of reconstruction, and that our proposed homotopy-based updating scheme outperforms current state-of-the-art solvers in terms of the computation time and complexity.
FAFeb 28, 2013
Sparse Shape ReconstructionAlireza Aghasi, Justin Romberg
This paper introduces a new shape-based image reconstruction technique applicable to a large class of imaging problems formulated in a variational sense. Given a collection of shape priors (a shape dictionary), we define our problem as choosing the right elements and geometrically composing them through basic set operations to characterize desired regions in the image. This combinatorial problem can be relaxed and then solved using classical descent methods. The main component of this relaxation is forming certain compactly supported functions which we call "knolls", and reformulating the shape representation as a basis expansion in terms of such functions. To select suitable elements of the dictionary, our problem ultimately reduces to solving a nonlinear program with sparsity constraints. We provide a new sparse nonlinear reconstruction technique to approach this problem. The performance of proposed technique is demonstrated with some standard imaging problems including image segmentation, X-ray tomography and diffusive tomography.
COAug 3, 2012
Fast and Accurate Algorithms for Re-Weighted L1-Norm MinimizationM. Salman Asif, Justin Romberg
To recover a sparse signal from an underdetermined system, we often solve a constrained L1-norm minimization problem. In many cases, the signal sparsity and the recovery performance can be further improved by replacing the L1 norm with a "weighted" L1 norm. Without any prior information about nonzero elements of the signal, the procedure for selecting weights is iterative in nature. Common approaches update the weights at every iteration using the solution of a weighted L1 problem from the previous iteration. In this paper, we present two homotopy-based algorithms that efficiently solve reweighted L1 problems. First, we present an algorithm that quickly updates the solution of a weighted L1 problem as the weights change. Since the solution changes only slightly with small changes in the weights, we develop a homotopy algorithm that replaces the old weights with the new ones in a small number of computationally inexpensive steps. Second, we propose an algorithm that solves a weighted L1 problem by adaptively selecting the weights while estimating the signal. This algorithm integrates the reweighting into every step along the homotopy path by changing the weights according to the changes in the solution and its support, allowing us to achieve a high quality signal reconstruction by solving a single homotopy problem. We compare the performance of both algorithms, in terms of reconstruction accuracy and computational complexity, against state-of-the-art solvers and show that our methods have smaller computational cost. In addition, we will show that the adaptive selection of the weights inside the homotopy often yields reconstructions of higher quality.
NADec 7, 2005
Stable Signal Recovery from Incomplete and Inaccurate MeasurementsEmmanuel Candes, Justin Romberg, Terence Tao
Suppose we wish to recover an n-dimensional real-valued vector x_0 (e.g. a digital signal or image) from incomplete and contaminated observations y = A x_0 + e; A is a n by m matrix with far fewer rows than columns (n << m) and e is an error term. Is it possible to recover x_0 accurately based on the data y? To recover x_0, we consider the solution x* to the l1-regularization problem min \|x\|_1 subject to \|Ax-y\|_2 <= epsilon, where epsilon is the size of the error term e. We show that if A obeys a uniform uncertainty principle (with unit-normed columns) and if the vector x_0 is sufficiently sparse, then the solution is within the noise level \|x* - x_0\|_2 \le C epsilon. As a first example, suppose that A is a Gaussian random matrix, then stable recovery occurs for almost all such A's provided that the number of nonzeros of x_0 is of about the same order as the number of observations. Second, suppose one observes few Fourier samples of x_0, then stable recovery occurs for almost any set of p coefficients provided that the number of nonzeros is of the order of n/[\log m]^6. In the case where the error term vanishes, the recovery is of course exact, and this work actually provides novel insights on the exact recovery phenomenon discussed in earlier papers. The methodology also explains why one can also very nearly recover approximately sparse signals.
NASep 10, 2004
Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency InformationEmmanuel Candes, Justin Romberg, Terence Tao
This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal $f \in \C^N$ and a randomly chosen set of frequencies $Ω$ of mean size $τN$. Is it possible to reconstruct $f$ from the partial knowledge of its Fourier coefficients on the set $Ω$? A typical result of this paper is as follows: for each $M > 0$, suppose that $f$ obeys $$ # \{t, f(t) \neq 0 \} \le α(M) \cdot (\log N)^{-1} \cdot # Ω, $$ then with probability at least $1-O(N^{-M})$, $f$ can be reconstructed exactly as the solution to the $\ell_1$ minimization problem $$ \min_g \sum_{t = 0}^{N-1} |g(t)|, \quad \text{s.t.} \hat g(ω) = \hat f(ω) \text{for all} ω\in Ω. $$ In short, exact recovery may be obtained by solving a convex optimization problem. We give numerical values for $α$ which depends on the desired probability of success; except for the logarithmic factor, the condition on the size of the support is sharp. The methodology extends to a variety of other setups and higher dimensions. For example, we show how one can reconstruct a piecewise constant (one or two-dimensional) object from incomplete frequency samples--provided that the number of jumps (discontinuities) obeys the condition above--by minimizing other convex functionals such as the total-variation of $f$.