OCDec 17, 2011
Coherence in Large-Scale Networks: Dimension-Dependent Limitations of Local FeedbackBassam Bamieh, Mihailo R. Jovanović, Partha Mitra et al.
We consider distributed consensus and vehicular formation control problems. Specifically we address the question of whether local feedback is sufficient to maintain coherence in large-scale networks subject to stochastic disturbances. We define macroscopic performance measures which are global quantities that capture the notion of coherence; a notion of global order that quantifies how closely the formation resembles a solid object. We consider how these measures scale asymptotically with network size in the topologies of regular lattices in 1, 2 and higher dimensions, with vehicular platoons corresponding to the 1 dimensional case. A common phenomenon appears where a higher spatial dimension implies a more favorable scaling of coherence measures, with a dimensions of 3 being necessary to achieve coherence in consensus and vehicular formations under certain conditions. In particular, we show that it is impossible to have large coherent one dimensional vehicular platoons with only local feedback. We analyze these effects in terms of the underlying energetic modes of motion, showing that they take the form of large temporal and spatial scales resulting in an accordion-like motion of formations. A conclusion can be drawn that in low spatial dimensions, local feedback is unable to regulate large-scale disturbances, but it can in higher spatial dimensions. This phenomenon is distinct from, and unrelated to string instability issues which are commonly encountered in control problems for automated highways.
OCMar 20, 2013
Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of MultipliersFu Lin, Makan Fardad, Mihailo R. Jovanović
We design sparse and block sparse feedback gains that minimize the variance amplification (i.e., the $H_2$ norm) of distributed systems. Our approach consists of two steps. First, we identify sparsity patterns of feedback gains by incorporating sparsity-promoting penalty functions into the optimal control problem, where the added terms penalize the number of communication links in the distributed controller. Second, we optimize feedback gains subject to structural constraints determined by the identified sparsity patterns. In the first step, the sparsity structure of feedback gains is identified using the alternating direction method of multipliers, which is a powerful algorithm well-suited to large optimization problems. This method alternates between promoting the sparsity of the controller and optimizing the closed-loop performance, which allows us to exploit the structure of the corresponding objective functions. In particular, we take advantage of the separability of the sparsity-promoting penalty functions to decompose the minimization problem into sub-problems that can be solved analytically. Several examples are provided to illustrate the effectiveness of the developed approach.
OCDec 18, 2011
Optimal Control of Vehicular Formations with Nearest Neighbor InteractionsFu Lin, Makan Fardad, Mihailo R. Jovanović
We consider the design of optimal localized feedback gains for one-dimensional formations in which vehicles only use information from their immediate neighbors. The control objective is to enhance coherence of the formation by making it behave like a rigid lattice. For the single-integrator model with symmetric gains, we establish convexity, implying that the globally optimal controller can be computed efficiently. We also identify a class of convex problems for double-integrators by restricting the controller to symmetric position and uniform diagonal velocity gains. To obtain the optimal non-symmetric gains for both the single- and the double-integrator models, we solve a parameterized family of optimal control problems ranging from an easily solvable problem to the problem of interest as the underlying parameter increases. When this parameter is kept small, we employ perturbation analysis to decouple the matrix equations that result from the optimality conditions, thereby rendering the unique optimal feedback gain. This solution is used to initialize a homotopy-based Newton's method to find the optimal localized gain. To investigate the performance of localized controllers, we examine how the coherence of large-scale stochastically forced formations scales with the number of vehicles. We establish several explicit scaling relationships and show that the best performance is achieved by a localized controller that is both non-symmetric and spatially-varying.
OCAug 25, 2018
The proximal augmented Lagrangian method for nonsmooth composite optimizationNeil K. Dhingra, Sei Zhen Khong, Mihailo R. Jovanović
We study a class of optimization problems in which the objective function is given by the sum of a differentiable but possibly nonconvex component and a nondifferentiable convex regularization term. We introduce an auxiliary variable to separate the objective function components and utilize the Moreau envelope of the regularization term to derive the proximal augmented Lagrangian $-$ a continuously differentiable function obtained by constraining the augmented Lagrangian to the manifold that corresponds to the explicit minimization over the variable in the nonsmooth term. The continuous differentiability of this function with respect to both primal and dual variables allows us to leverage the method of multipliers (MM) to compute optimal primal-dual pairs by solving a sequence of differentiable problems. The MM algorithm is applicable to a broader class of problems than proximal gradient methods and it has stronger convergence guarantees and a more refined step-size update rules than the alternating direction method of multipliers. These features make it an attractive option for solving structured optimal control problems. We also develop an algorithm based on the primal-descent dual-ascent gradient method and prove global (exponential) asymptotic stability when the differentiable component of the objective function is (strongly) convex and the regularization term is convex. Finally, we identify classes of problems for which the primal-dual gradient flow dynamics are convenient for distributed implementation and compare/contrast our framework to the existing approaches.
OCOct 11, 2013
Design of optimal sparse interconnection graphs for synchronization of oscillator networksMakan Fardad, Fu Lin, Mihailo R. Jovanović
We study the optimal design of a conductance network as a means for synchronizing a given set of oscillators. Synchronization is achieved when all oscillator voltages reach consensus, and performance is quantified by the mean-square deviation from the consensus value. We formulate optimization problems that address the trade-off between synchronization performance and the number and strength of oscillator couplings. We promote the sparsity of the coupling network by penalizing the number of interconnection links. For identical oscillators, we establish convexity of the optimization problem and demonstrate that the design problem can be formulated as a semidefinite program. Finally, for special classes of oscillator networks we derive explicit analytical expressions for the optimal conductance values.
OCOct 18, 2016
Topology design for stochastically-forced consensus networksSepideh Hassan-Moghaddam, Mihailo R. Jovanović
We study an optimal control problem aimed at achieving a desired tradeoff between the network coherence and communication requirements in the distributed controller. Our objective is to add a certain number of edges to an undirected network, with a known graph Laplacian, in order to optimally enhance closed-loop performance. To promote controller sparsity, we introduce $\ell_1$-regularization into the optimal ${\cal H}_2$ formulation and cast the design problem as a semidefinite program. We derive a Lagrange dual, provide interpretation of dual variables, and exploit structure of the optimality conditions for undirected networks to develop customized proximal gradient and Newton algorithms that are well-suited for large problems. We illustrate that our algorithms can solve the problems with more than million edges in the controller graph in a few minutes, on a PC. We also exploit structure of connected resistive networks to demonstrate how additional edges can be systematically added in order to minimize the ${\cal H}_2$ norm of the closed-loop system.
OCMar 4, 2018
Structured decentralized control of positive systems with applications to combination drug therapy and leader selection in directed networksNeil K. Dhingra, Marcello Colombino, Mihailo R. Jovanović
We study a class of structured optimal control problems in which the main diagonal of the dynamic matrix is a linear function of the design variable. While such problems are in general challenging and nonconvex, for positive systems we prove convexity of the $H_2$ and $H_\infty$ optimal control formulations which allow for arbitrary convex constraints and regularization of the control input. Moreover, we establish differentiability of the $H_\infty$ norm when the graph associated with the dynamical generator is weakly connected and develop a customized algorithm for computing the optimal solution even in the absence of differentiability. We apply our results to the problems of leader selection in directed consensus networks and combination drug therapy for HIV treatment. In the context of leader selection, we address the combinatorial challenge by deriving upper and lower bounds on optimal performance. For combination drug therapy, we develop a customized subgradient method for efficient treatment of diseases whose mutation patterns are not connected.
OCNov 30, 2016
On the optimal control problem for a class of monotone bilinear systemsNeil K. Dhingra, Marcello Colombino, Mihailo R. Jovanović et al.
We consider a class of monotone systems in which the control signal multiplies the state. Among other applications, such bilinear systems can be used to model the evolutionary dynamics of HIV in the presence of combination drug therapy. For this class of systems, we formulate an infinite horizon optimal control problem, prove that the optimal control signal is constant over time, and show that it can be computed by solving a finite-dimensional non-smooth convex optimization problem. We provide an explicit expression for the subdifferential set of the objective function and use a subgradient algorithm to design the optimal controller. We further extend our results to characterize the optimal robust controller for systems with uncertain dynamics and show that computing the robust controller is no harder than computing the nominal controller. We illustrate our results with an example motivated by combination drug therapy.
OCJun 6, 2022
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPsDongsheng Ding, Kaiqing Zhang, Jiali Duan et al.
We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected subgradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. We use a set of computational experiments to showcase the effectiveness of our approach.
OCSep 24, 2022
Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithmsHesameddin Mohammadi, Meisam Razaviyayn, Mihailo R. Jovanović
We study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to account for uncertainty in either gradient evaluation or iteration updates, and it includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and identify fundamental stochastic performance tradeoffs. Our approach utilizes the Jury stability criterion to provide a novel geometric characterization of conditions for linear convergence, and it reveals the relation between the noise amplification and convergence rate as well as their dependence on the condition number and the constant algorithmic parameters. This geometric insight leads to simple alternative proofs of standard convergence results and allows us to establish ``uncertainty principle'' of strongly convex optimization: for the two-step momentum method with linear convergence rate, the lower bound on the product between the settling time and noise amplification scales quadratically with the condition number. Our analysis also identifies a key difference between the gradient and iterate noise models: while the amplification of gradient noise can be made arbitrarily small by sufficiently decelerating the algorithm, the best achievable variance for the iterate noise model increases linearly with the settling time in the decelerating regime. Finally, we introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving order-wise Pareto optimality for both noise models.
OCAug 28, 2024
Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization ProblemsIbrahim K. Ozaslan, Panagiotis Patrinos, Mihailo R. Jovanović
We examine stability properties of primal-dual gradient flow dynamics for composite convex optimization problems with multiple, possibly nonsmooth, terms in the objective function under the generalized consensus constraint. The proposed dynamics are based on the proximal augmented Lagrangian and they provide a viable alternative to ADMM which faces significant challenges from both analysis and implementation viewpoints in large-scale multi-block scenarios. In contrast to customized algorithms with individualized convergence guarantees, we develop a systematic approach for solving a broad class of challenging composite optimization problems. We leverage various structural properties to establish global (exponential) convergence guarantees for the proposed dynamics. Our assumptions are much weaker than those required to prove (exponential) stability of primal-dual dynamics as well as (linear) convergence of discrete-time methods such as standard two-block and multi-block ADMM and EXTRA algorithms. Finally, we show necessity of some of our structural assumptions for exponential stability and provide computational experiments to demonstrate the convenience of the proposed approach for parallel and distributed computing applications.
OCSep 18, 2024
From exponential to finite/fixed-time stability: Applications to optimizationIbrahim K. Ozaslan, Mihailo R. Jovanović
The development of finite/fixed-time stable optimization algorithms typically involves study of specific problem instances. The lack of a unified framework hinders understanding of more sophisticated algorithms, e.g., primal-dual gradient flow dynamics. The purpose of this paper is to address the following question: Given an exponentially stable optimization algorithm, can it be modified to obtain a finite/fixed-time stable algorithm? We provide an affirmative answer, demonstrate how the solution can be computed on a finite-time interval via a simple scaling of the right-hand-side of the original dynamics, and certify the desired properties of the modified algorithm using the Lyapunov function that proves exponential stability of the original system. Finally, we examine nonsmooth composite optimization problems and smooth problems with linear constraints to demonstrate the merits of our approach.
OCJul 30, 2024
Accelerated forward-backward and Douglas-Rachford splitting dynamicsIbrahim K. Ozaslan, Mihailo R. Jovanović
We examine convergence properties of continuous-time variants of accelerated Forward-Backward (FB) and Douglas-Rachford (DR) splitting algorithms for nonsmooth composite optimization problems. When the objective function is given by the sum of a quadratic and a nonsmooth term, we establish accelerated sublinear and exponential convergence rates for convex and strongly convex problems, respectively. Moreover, for FB splitting dynamics, we demonstrate that accelerated exponential convergence rate carries over to general strongly convex problems. In our Lyapunov-based analysis we exploit the variable-metric gradient interpretations of FB and DR splittings to obtain smooth Lyapunov functions that allow us to establish accelerated convergence rates. We provide computational experiments to demonstrate the merits and the effectiveness of our analysis.
LGNov 24, 2024
Stability properties of gradient flow dynamics for the symmetric low-rank matrix factorization problemHesameddin Mohammadi, Mohammad Tinati, Stephen Tu et al.
The symmetric low-rank matrix factorization serves as a building block in many learning tasks, including matrix recovery and training of neural networks. However, despite a flurry of recent research, the dynamics of its training via non-convex factorized gradient-descent-type methods is not fully understood especially in the over-parameterized regime where the fitted rank is higher than the true rank of the target matrix. To overcome this challenge, we characterize equilibrium points of the gradient flow dynamics and examine their local and global stability properties. To facilitate a precise global analysis, we introduce a nonlinear change of variables that brings the dynamics into a cascade connection of three subsystems whose structure is simpler than the structure of the original system. We demonstrate that the Schur complement to a principal eigenspace of the target matrix is governed by an autonomous system that is decoupled from the rest of the dynamics. In the over-parameterized regime, we show that this Schur complement vanishes at an $O(1/t)$ rate, thereby capturing the slow dynamics that arises from excess parameters. We utilize a Lyapunov-based approach to establish exponential convergence of the other two subsystems. By decoupling the fast and slow parts of the dynamics, we offer new insight into the shape of the trajectories associated with local search algorithms and provide a complete characterization of the equilibrium points and their global stability properties. Such an analysis via nonlinear control techniques may prove useful in several related over-parameterized problems.
LGMay 31, 2023
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement LearningDongsheng Ding, Xiaohan Wei, Zhuoran Yang et al.
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L \sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.
LGFeb 8, 2022
Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic ConvergenceDongsheng Ding, Chen-Yu Wei, Kaiqing Zhang et al.
We examine global non-asymptotic convergence properties of policy gradient methods for multi-agent reinforcement learning (RL) problems in Markov potential games (MPG). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $ε$-Nash equilibrium with $O(1/ε^2)$ iteration complexity which does not explicitly depend on the state space size. When the exact gradient is not available, we establish $O(1/ε^5)$ sample complexity bound in a potentially infinitely large state space for a sample-based algorithm that utilizes function approximation. Moreover, we identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played. Finally, we provide computational experiments to corroborate the merits and the effectiveness of our theoretical developments.
OCMar 14, 2021
Transient growth of accelerated optimization algorithmsHesameddin Mohammadi, Samantha Samuelson, Mihailo R. Jovanović
Optimization algorithms are increasingly being used in applications with limited time budgets. In many real-time and embedded scenarios, only a few iterations can be performed and traditional convergence metrics cannot be used to evaluate performance in these non-asymptotic regimes. In this paper, we examine the transient behavior of accelerated first-order optimization algorithms. For convex quadratic problems, we employ tools from linear systems theory to show that transient growth arises from the presence of non-normal dynamics. We identify the existence of modes that yield an algebraic growth in early iterations and quantify the transient excursion from the optimal solution caused by these modes. For strongly convex smooth optimization problems, we utilize the theory of integral quadratic constraints (IQCs) to establish an upper bound on the magnitude of the transient response of Nesterov's accelerated algorithm. We show that both the Euclidean distance between the optimization variable and the global minimizer and the rise time to the transient peak are proportional to the square root of the condition number of the problem. Finally, for problems with large condition numbers, we demonstrate tightness of the bounds that we derive up to constant factors.
LGMar 1, 2020
Provably Efficient Safe Exploration via Primal-Dual Policy OptimizationDongsheng Ding, Xiaohan Wei, Zhuoran Yang et al.
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation in which an agent aims to maximize the expected total reward subject to a safety constraint on the expected total value of a utility function. We focus on an episodic setting with the function approximation where the Markov transition kernels have a linear structure but do not impose any additional assumptions on the sampling model. Designing SRL algorithms with provable computational and statistical efficiency is particularly challenging under this setting because of the need to incorporate both the safety constraint and the function approximation into the fundamental exploitation/exploration tradeoff. To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration. We prove that the proposed algorithm achieves an $\tilde{O}(d H^{2.5}\sqrt{T})$ regret and an $\tilde{O}(d H^{2.5}\sqrt{T})$ constraint violation, where $d$ is the dimension of the feature mapping, $H$ is the horizon of each episode, and $T$ is the total number of steps. These bounds hold when the reward/utility functions are fixed but the feedback after each episode is bandit. Our bounds depend on the capacity of the state-action space only through the dimension of the feature mapping and thus our results hold even when the number of states goes to infinity. To the best of our knowledge, we provide the first provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
OCDec 26, 2019
Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problemHesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi et al.
Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems and the lack of exact gradient computation. In this paper, we take a step towards demystifying the performance and efficiency of such methods by focusing on the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We establish exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and show that a similar result holds for the gradient descent method that arises from the forward Euler discretization of the corresponding ODE. We also provide theoretical bounds on the convergence rate and sample complexity of the random search method with two-point gradient estimates. We prove that the required simulation time for achieving $ε$-accuracy in the model-free setup and the total number of function evaluations both scale as $\log \, (1/ε)$.
OCOct 2, 2019
Global exponential stability of primal-dual gradient flow dynamics based on the proximal augmented Lagrangian: A Lyapunov-based approachDongsheng Ding, Mihailo R. Jovanović
For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian. The result holds when the differentiable part of the objective function is strongly convex with a Lipschitz continuous gradient; the non-differentiable part is proper, lower semi-continuous, and convex; and the matrix in the linear constraint is full row rank. Our quadratic Lyapunov function generalizes recent result from strongly convex problems with either affine equality or inequality constraints to a broader class of composite optimization problems with nonsmooth regularizers and it provides a worst-case lower bound of the exponential decay rate. Finally, we use computational experiments to demonstrate that our convergence rate estimate is less conservative than the existing alternatives.
FLU-DYNAug 26, 2019
Stochastic dynamical modeling of turbulent flowsArmin Zare, Tryphon T. Georgiou, Mihailo R. Jovanović
Advanced measurement techniques and high performance computing have made large data sets available for a wide range of turbulent flows that arise in engineering applications. Drawing on this abundance of data, dynamical models can be constructed to reproduce structural and statistical features of turbulent flows, opening the way to the design of effective model-based flow control strategies. This review describes a framework for completing second-order statistics of turbulent flows by models that are based on the Navier-Stokes equations linearized around the turbulent mean velocity. Systems theory and convex optimization are combined to address the inherent uncertainty in the dynamics and the statistics of the flow by seeking a suitable parsimonious correction to the prior linearized model. Specifically, dynamical couplings between states of the linearized model dictate structural constraints on the statistics of flow fluctuations. Thence, colored-in-time stochastic forcing that drives the linearized model is sought to account for and reconcile dynamics with available data (i.e., partially known second order statistics). The number of dynamical degrees of freedom that are directly affected by stochastic excitation is minimized as a measure of model parsimony. The spectral content of the resulting colored-in-time stochastic contribution can alternatively be seen to arise from a low-rank structural perturbation of the linearized dynamical generator, pointing to suitable dynamical corrections that may account for the absence of the nonlinear interactions in the linearized model.
OCAug 23, 2019
Proximal gradient flow and Douglas-Rachford splitting dynamics: global exponential stability via integral quadratic constraintsSepideh Hassan-Moghaddam, Mihailo R. Jovanović
Many large-scale and distributed optimization problems can be brought into a composite form in which the objective function is given by the sum of a smooth term and a nonsmooth regularizer. Such problems can be solved via a proximal gradient method and its variants, thereby generalizing gradient descent to a nonsmooth setup. In this paper, we view proximal algorithms as dynamical systems and leverage techniques from control theory to study their global properties. In particular, for problems with strongly convex objective functions, we utilize the theory of integral quadratic constraints to prove the global exponential stability of the equilibrium points of the differential equations that govern the evolution of proximal gradient and Douglas-Rachford splitting flows. In our analysis, we use the fact that these algorithms can be interpreted as variable-metric gradient methods on the suitable envelopes and exploit structural properties of the nonlinear terms that arise from the gradient of the smooth part of the objective function and the proximal operator associated with the nonsmooth regularizer. We also demonstrate that these envelopes can be obtained from the augmented Lagrangian associated with the original nonsmooth problem and establish conditions for global exponential convergence even in the absence of strong convexity.
OCAug 7, 2019
Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual OptimizationDongsheng Ding, Xiaohan Wei, Zhuoran Yang et al.
We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. When the dimension of state-action space is large, the temporal-difference learning with linear function approximation is widely used. In this paper, we develop a new distributed temporal-difference learning algorithm and quantify its finite-time performance. Our algorithm combines a distributed stochastic primal-dual method with a homotopy-based approach to adaptively adjust the learning rate in order to minimize the mean-square projected Bellman error by taking fresh online samples from a causal on-policy trajectory. We explicitly take into account the Markovian nature of sampling and improve the best-known finite-time error bound from $O(1/\sqrt{T})$ to~$O(1/T)$, where $T$ is the total number of iterations.
OCMay 27, 2019
Robustness of accelerated first-order algorithms for strongly convex optimization problemsHesameddin Mohammadi, Meisam Razaviyayn, Mihailo R. Jovanović
We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-squared error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over a network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-squared deviation from the optimal solution that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.
OCJul 4, 2018
Proximal algorithms for large-scale statistical modeling and sensor/actuator selectionArmin Zare, Hesameddin Mohammadi, Neil K. Dhingra et al.
Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The first, in statistical modeling, seeks to reconcile observed statistics by suitably and minimally perturbing prior dynamics. The second seeks to optimally select a subset of available sensors and actuators for control purposes. To address modeling and control of large-scale systems we develop a unified algorithmic framework using proximal methods. Our customized algorithms exploit problem structure and allow handling statistical modeling, as well as sensor and actuator selection, for substantially larger scales than what is amenable to current general-purpose solvers. We establish linear convergence of the proximal gradient algorithm, draw contrast between the proposed proximal algorithms and alternating direction method of multipliers, and provide examples that illustrate the merits and effectiveness of our framework.
OCSep 5, 2017
A second order primal-dual method for nonsmooth convex composite optimizationNeil K. Dhingra, Sei Zhen Khong, Mihailo R. Jovanović
We develop a second order primal-dual method for optimization problems in which the objective function is given by the sum of a strongly convex twice differentiable term and a possibly nondifferentiable convex regularizer. After introducing an auxiliary variable, we utilize the proximal operator of the nonsmooth regularizer to transform the associated augmented Lagrangian into a function that is once, but not twice, continuously differentiable. The saddle point of this function corresponds to the solution of the original optimization problem. We employ a generalization of the Hessian to define second order updates on this function and prove global exponential stability of the corresponding differential inclusion. Furthermore, we develop a globally convergent customized algorithm that utilizes the primal-dual augmented Lagrangian as a merit function. We show that the search direction can be computed efficiently and prove quadratic/superlinear asymptotic convergence. We use the $\ell_1$-regularized model predictive control problem and the problem of designing a distributed controller for a spatially-invariant system to demonstrate the merits and the effectiveness of our method.
OCFeb 3, 2013
Algorithms for leader selection in stochastically forced consensus networksFu Lin, Makan Fardad, Mihailo R. Jovanović
We are interested in assigning a pre-specified number of nodes as leaders in order to minimize the mean-square deviation from consensus in stochastically forced networks. This problem arises in several applications including control of vehicular formations and localization in sensor networks. For networks with leaders subject to noise, we show that the Boolean constraints (a node is either a leader or it is not) are the only source of nonconvexity. By relaxing these constraints to their convex hull we obtain a lower bound on the global optimal value. We also use a simple but efficient greedy algorithm to identify leaders and to compute an upper bound. For networks with leaders that perfectly follow their desired trajectories, we identify an additional source of nonconvexity in the form of a rank constraint. Removal of the rank constraint and relaxation of the Boolean constraints yields a semidefinite program for which we develop a customized algorithm well-suited for large networks. Several examples ranging from regular lattices to random graphs are provided to illustrate the effectiveness of the developed algorithms.