OCNov 14, 2023
Discretized Distributed Optimization over Dynamic DigraphsMohammadreza Doostmohammadian, Wei Jiang, Muwahida Liaquat et al.
We consider a discrete-time model of continuous-time distributed optimization over dynamic directed-graphs (digraphs) with applications to distributed learning. Our optimization algorithm works over general strongly connected dynamic networks under switching topologies, e.g., in mobile multi-agent systems and volatile networks due to link failures. Compared to many existing lines of work, there is no need for bi-stochastic weight designs on the links. The existing literature mostly needs the link weights to be stochastic using specific weight-design algorithms needed both at the initialization and at all times when the topology of the network changes. This paper eliminates the need for such algorithms and paves the way for distributed optimization over time-varying digraphs. We derive the bound on the gradient-tracking step-size and discrete time-step for convergence and prove dynamic stability using arguments from consensus algorithms, matrix perturbation theory, and Lyapunov theory. This work, particularly, is an improvement over existing stochastic-weight undirected networks in case of link removal or packet drops. This is because the existing literature may need to rerun time-consuming and computationally complex algorithms for stochastic design, while the proposed strategy works as long as the underlying network is weight-symmetric and balanced. The proposed optimization framework finds applications to distributed classification and learning.
SYMar 28, 2022
Distributed Finite-Sum Constrained Optimization subject to Nonlinearity on the Node DynamicsMohammadreza Doostmohammadian, Maria Vrakopoulou, Alireza Aghasi et al.
Motivated by recent development in networking and parallel data-processing, we consider a distributed and localized finite-sum (or fixed-sum) allocation technique to solve resource-constrained convex optimization problems over multi-agent networks (MANs). Such networks include (smart) agents representing an intelligent entity capable of communication, processing, and decision-making. In particular, we consider problems subject to practical nonlinear constraints on the dynamics of the agents in terms of their communications and actuation capabilities (referred to as the node dynamics), e.g., networks of mobile robots subject to actuator saturation and quantized communication. The considered distributed sum-preserving optimization solution further enables adding purposeful nonlinear constraints, for example, sign-based nonlinearities, to reach convergence in predefined-time or robust to impulsive noise and disturbances in faulty environments. Moreover, convergence can be achieved under minimal network connectivity requirements among the agents; thus, the solution is applicable over dynamic networks where the channels come and go due to the agent's mobility and limited range. This paper discusses how various nonlinearity constraints on the optimization problem (e.g., collaborative allocation of resources) can be addressed for different applications via a distributed setup (over a network).
SYApr 13, 2023
D-SVM over Networked Systems with Non-Ideal Linking ConditionsMohammadreza Doostmohammadian, Alireza Aghasi, Houman Zarrabi
This paper considers distributed optimization algorithms, with application in binary classification via distributed support-vector-machines (D-SVM) over multi-agent networks subject to some link nonlinearities. The agents solve a consensus-constraint distributed optimization cooperatively via continuous-time dynamics, while the links are subject to strongly sign-preserving odd nonlinear conditions. Logarithmic quantization and clipping (saturation) are two examples of such nonlinearities. In contrast to existing literature that mostly considers ideal links and perfect information exchange over linear channels, we show how general sector-bounded models affect the convergence to the optimizer (i.e., the SVM classifier) over dynamic balanced directed networks. In general, any odd sector-bounded nonlinear mapping can be applied to our dynamics. The main challenge is to show that the proposed system dynamics always have one zero eigenvalue (associated with the consensus) and the other eigenvalues all have negative real parts. This is done by recalling arguments from matrix perturbation theory. Then, the solution is shown to converge to the agreement state under certain conditions. For example, the gradient tracking (GT) step size is tighter than the linear case by factors related to the upper/lower sector bounds. To the best of our knowledge, no existing work in distributed optimization and learning literature considers non-ideal link conditions.
LGMay 26, 2022
RIGID: Robust Linear Regression with Missing DataAlireza Aghasi, MohammadJavad Feizollahi, Saeed Ghadimi
We present a robust framework to perform linear regression with missing entries in the features. By considering an elliptical data distribution, and specifically a multivariate normal model, we are able to conditionally formulate a distribution for the missing entries and present a robust framework, which minimizes the worst case error caused by the uncertainty about the missing data. We show that the proposed formulation, which naturally takes into account the dependency between different variables, ultimately reduces to a convex program, for which a customized and scalable solver can be delivered. In addition to a detailed analysis to deliver such solver, we also asymptoticly analyze the behavior of the proposed framework, and present technical discussions to estimate the required input parameters. We complement our analysis with experiments performed on synthetic, semi-synthetic, and real data, and show how the proposed formulation improves the prediction accuracy and robustness, and outperforms the competing techniques. Missing data is a common problem associated with many datasets in machine learning. With the significant increase in using robust optimization techniques to train machine learning models, this paper presents a novel robust regression framework that operates by minimizing the uncertainty associated with missing data. The proposed approach allows training models with incomplete data, while minimizing the impact of uncertainty associated with the unavailable data. The ideas developed in this paper can be generalized beyond linear models and elliptical data distributions.
LGNov 26, 2025Code
G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural NetworksAlireza Aghasi, Nicholas Marshall, Saeid Pourmand et al.
We propose a novel randomized algorithm for constructing binary neural networks with tunable accuracy. This approach is motivated by hyperdimensional computing (HDC), which is a brain-inspired paradigm that leverages high-dimensional vector representations, offering efficient hardware implementation and robustness to model corruptions. Unlike traditional low-precision methods that use quantization, we consider binary embeddings of data as points in the hypercube equipped with the Hamming distance. We propose a novel family of floating-point neural networks, G-Nets, which are general enough to mimic standard network layers. Each floating-point G-Net has a randomized binary embedding, an embedded hyperdimensional (EHD) G-Net, that retains the accuracy of its floating-point counterparts, with theoretical guarantees, due to the concentration of measure. Empirically, our binary models match convolutional neural network accuracies and outperform prior HDC models by large margins, for example, we achieve almost 30\% higher accuracy on CIFAR-10 compared to prior HDC models. G-Nets are a theoretically justified bridge between neural networks and randomized binary neural networks, opening a new direction for constructing robust binary/quantized deep learning models. Our implementation is available at https://github.com/GNet2025/GNet.
LGNov 19, 2020Code
Inverse Constrained Reinforcement LearningUsman Anwar, Shehryar Malik, Alireza Aghasi et al.
In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{https://github.com/shehryar-malik/icrl}.
OCMar 29, 2024
Fully Zeroth-Order Bilevel Programming via Gaussian SmoothingAlireza Aghasi, Saeed Ghadimi
In this paper, we study and analyze zeroth-order stochastic approximation algorithms for solving bilvel problems, when neither the upper/lower objective values, nor their unbiased gradient estimates are available. In particular, exploiting Stein's identity, we first use Gaussian smoothing to estimate first- and second-order partial derivatives of functions with two independent block of variables. We then used these estimates in the framework of a stochastic approximation algorithm for solving bilevel optimization problems and establish its non-asymptotic convergence analysis. To the best of our knowledge, this is the first time that sample complexity bounds are established for a fully stochastic zeroth-order bilevel optimization algorithm.
MLFeb 27, 2024
Certain and Approximately Certain Models for Statistical LearningCheng Zhen, Nischal Aryal, Arash Termehchy et al.
Real-world data is often incomplete and contains missing values. To train accurate models over real-world datasets, users need to spend a substantial amount of time and resources imputing and finding proper values for missing data items. In this paper, we demonstrate that it is possible to learn accurate models directly from data with missing values for certain training data and target models. We propose a unified approach for checking the necessity of data imputation to learn accurate models across various widely-used machine learning paradigms. We build efficient algorithms with theoretical guarantees to check this necessity and return accurate models in cases where imputation is unnecessary. Our extensive experiments indicate that our proposed algorithms significantly reduce the amount of time and effort needed for data imputation without imposing considerable computational overhead.
LGApr 16, 2024
Laplace-HDC: Understanding the geometry of binary hyperdimensional computingSaeid Pourmand, Wyatt D. Whiting, Alireza Aghasi et al.
This paper studies the geometry of binary hyperdimensional computing (HDC), a computational scheme in which data are encoded using high-dimensional binary vectors. We establish a result about the similarity structure induced by the HDC binding operator and show that the Laplace kernel naturally arises in this setting, motivating our new encoding method Laplace-HDC, which improves upon previous methods. We describe how our results indicate limitations of binary HDC in encoding spatial information from images and discuss potential solutions, including using Haar convolutional features and the definition of a translation-equivariant HDC encoding. Several numerical experiments highlighting the improved accuracy of Laplace-HDC in contrast to alternative methods are presented. We also numerically study other aspects of the proposed framework such as robustness and the underlying translation-equivariant encoding.
SYApr 1, 2021
Distributed support-vector-machine over dynamic balanced directed networksMohammadreza Doostmohammadian, Alireza Aghasi, Themistoklis Charalambous et al.
In this paper, we consider the binary classification problem via distributed Support-Vector-Machines (SVM), where the idea is to train a network of agents, with limited share of data, to cooperatively learn the SVM classifier for the global database. Agents only share processed information regarding the classifier parameters and the gradient of the local loss functions instead of their raw data. In contrast to the existing work, we propose a continuous-time algorithm that incorporates network topology changes in discrete jumps. This hybrid nature allows us to remove chattering that arises because of the discretization of the underlying CT process. We show that the proposed algorithm converges to the SVM classifier over time-varying weight balanced directed graphs by using arguments from the matrix perturbation theory.
SYDec 15, 2020
Fast-Convergent Dynamics for Distributed Allocation of Resources Over Switching Sparse Networks with Quantized Communication LinksMohammadreza Doostmohammadian, Alireza Aghasi, Mohammad Pirani et al.
This paper proposes networked dynamics to solve resource allocation problems over time-varying multi-agent networks. The state of each agent represents the amount of used resources (or produced utilities) while the total amount of resources is fixed. The idea is to optimally allocate the resources among the group of agents by minimizing the overall cost function subject to fixed sum of resources. Each agents' information is restricted to its own state and cost function and those of its immediate in-neighbors. This is motivated by distributed applications such as mobile edge-computing, economic dispatch over smart grids, and multi-agent coverage control. This work provides a fast convergent solution (in comparison with linear dynamics) while considering relaxed network connectivity with quantized communication links. The proposed dynamics reaches optimal solution over switching (possibly disconnected) undirected networks as far as their union over some bounded non-overlapping time-intervals has a spanning-tree. We prove feasibility of the solution, uniqueness of the optimal state, and convergence to the optimal value under the proposed dynamics, where the analysis is applicable to similar 1st-order allocation dynamics with strongly sign-preserving nonlinearities, such as actuator saturation.
LGMar 26, 2020
Learning To Solve Differential Equations Across Initial ConditionsShehryar Malik, Usman Anwar, Ali Ahmed et al.
Recently, there has been a lot of interest in using neural networks for solving partial differential equations. A number of neural network-based partial differential equation solvers have been formulated which provide performances equivalent, and in some cases even superior, to classical solvers. However, these neural solvers, in general, need to be retrained each time the initial conditions or the domain of the partial differential equation changes. In this work, we posit the problem of approximating the solution of a fixed partial differential equation for any arbitrary initial conditions as learning a conditional probability distribution. We demonstrate the utility of our method on Burger's Equation.
LGJun 17, 2018
Fast Convex Pruning of Deep Neural NetworksAlireza Aghasi, Afshin Abdi, Justin Romberg
We develop a fast, tractable technique called Net-Trim for simplifying a trained neural network. The method is a convex post-processing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of Net-Trim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after Net-Trim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most $s$ terms that can re-create the layer outputs from the layer inputs, we can find these weights from $\mathcal{O}(s\log N/s)$ samples, where $N$ is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recently-developed tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of Net-Trim for network pruning and compression.
LGNov 16, 2016
Net-Trim: Convex Pruning of Deep Neural Networks with Performance GuaranteeAlireza Aghasi, Afshin Abdi, Nam Nguyen et al.
We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length $N$ as inputs, we show that if the network response can be described using a maximum number of $s$ non-zero weights per node, these weights can be learned from $\mathcal{O}(s\log N)$ samples.
CVMar 29, 2016
Sweep Distortion Removal from THz Images via Blind DemodulationAlireza Aghasi, Barmak Heshmat, Albert Redo-Sanchez et al.
Heavy sweep distortion induced by alignments and inter-reflections of layers of a sample is a major burden in recovering 2D and 3D information in time resolved spectral imaging. This problem cannot be addressed by conventional denoising and signal processing techniques as it heavily depends on the physics of the acquisition. Here we propose and implement an algorithmic framework based on low-rank matrix recovery and alternating minimization that exploits the forward model for THz acquisition. The method allows recovering the original signal in spite of the presence of temporal-spatial distortions. We address a blind-demodulation problem, where based on several observations of the sample texture modulated by an undesired sweep pattern, the two classes of signals are separated. The performance of the method is examined in both synthetic and experimental data, and the successful reconstructions are demonstrated. The proposed general scheme can be implemented to advance inspection and imaging applications in THz and other time-resolved sensing modalities.
CVFeb 23, 2016
Learning Shapes by Convex CompositionAlireza Aghasi, Justin Romberg
We present a mathematical and algorithmic scheme for learning the principal geometric elements in an image or 3D object. We build on recent work that convexifies the basic problem of finding a combination of a small number shapes that overlap and occlude one another in such a way that they "match" a given scene as closely as possible. This paper derives general sufficient conditions under which this convex shape composition identifies a target composition. From a computational standpoint, we present two different methods for solving the associated optimization programs. The first method simply recasts the problem as a linear program, while the second uses the alternating direction method of multipliers with a series of easily computed proximal operators.
FAFeb 28, 2013
Sparse Shape ReconstructionAlireza Aghasi, Justin Romberg
This paper introduces a new shape-based image reconstruction technique applicable to a large class of imaging problems formulated in a variational sense. Given a collection of shape priors (a shape dictionary), we define our problem as choosing the right elements and geometrically composing them through basic set operations to characterize desired regions in the image. This combinatorial problem can be relaxed and then solved using classical descent methods. The main component of this relaxation is forming certain compactly supported functions which we call "knolls", and reformulating the shape representation as a basis expansion in terms of such functions. To select suitable elements of the dictionary, our problem ultimately reduces to solving a nonlinear program with sparsity constraints. We provide a new sparse nonlinear reconstruction technique to approach this problem. The performance of proposed technique is demonstrated with some standard imaging problems including image segmentation, X-ray tomography and diffusive tomography.