Mehran Mesbahi

SY
h-index2
15papers
1,065citations
Novelty38%
AI Score40

15 Papers

OCOct 10, 2022
Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Bin Hu, Kaiqing Zhang, Na Li et al.

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), $\mathcal{H}_\infty$ control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.

SYApr 4, 2017
Adaptive Communication Networks with Privacy Guarantees

Atiye Alaeddini, Kristi Morgansen, Mehran Mesbahi

Utilizing the concept of observability, in conjunction with tools from graph theory and optimization, this paper develops an algorithm for network synthesis with privacy guarantees. In particular, we propose an algorithm for the selection of optimal weights for the communication graph in order to maximize the privacy of nodes in the network, from a control theoretic perspective. In this direction, we propose an observability-based design of the communication topology that improves the privacy of the network in presence of an intruder. The resulting adaptive network responds to the intrusion by changing the topology of the network-in an online manner- in order to reduce the information exposed to the intruder.

OCNov 2, 2011
A Sieve Method for Consensus-type Network Tomography

Marzieh Nabi-Abdolyousefi, Mehran Mesbahi

In this note, we examine the problem of identifying the interaction geometry among a known number of agents, adopting a consensus-type algorithm for their coordination. The proposed identification process is facilitated by introducing "ports" for stimulating a subset of network vertices via an appropriately defined interface and observing the network's response at another set of vertices. It is first noted that under the assumption of controllability and observability of corresponding steered-and-observed network, the proposed procedure identifies a number of important features of the network using the spectrum of the graph Laplacian. We then proceed to use degree-based graph reconstruction methods to propose a sieve method for further characterization of the underlying network. An example demonstrates the application of the proposed method.

SYFeb 10, 2020
Nonlinear Observability via Koopman Analysis: Characterizing the Role of Symmetry

Afshin Mesbahi, Jingjing Bu, Mehran Mesbahi

This paper considers the observability of nonlinear systems from a Koopman operator theoretic perspective--and in particular--the effect of symmetry on observability. We first examine an infinite-dimensional linear system (constructed using independent Koopman eigenfunctions) such that its observability is equivalent to the observability of the original nonlinear system. Next, we derive an analytic relation between symmetry and nonlinear observability; it is shown that symmetry in the nonlinear dynamics is reflected in the symmetry of the corresponding Koopman eigenfunctions, as well as presence of repeated Koopman eigenvalues. We then proceed to show that the loss of observability in symmetric nonlinear systems can be traced back to the presence of these repeated eigenvalues. In the case where we have a sufficient number of measurements, the nonlinear system remains unobservable when these functions have symmetries that mirror those of the dynamics. The proposed observability framework provides insights into the minimum number of the measurements needed to make an unobservable nonlinear system, observable. The proposed results are then applied to a network of nano-electromechanical oscillators coupled via a symmetric interaction topology.

SYNov 20, 2023
Data-Guided Regulator for Adaptive Nonlinear Control

Niyousha Rahimi, Mehran Mesbahi

This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics. Such disturbances are modeled as the "unknown" part of the system dynamics. The goal is to achieve finite-time regulation of system states through direct policy updates while also generating informative data that can subsequently be used for data-driven stabilization or system identification. First, we expand upon the notion of "regularizability" and characterize this system characteristic for a linear time-varying representation of the nonlinear system with locally-bounded higher-order terms. "Rapid-regularizability" then gauges the extent by which a system can be regulated in finite time, in contrast to its asymptotic behavior. We then propose the Data-Guided Regulation for Adaptive Nonlinear Control ( DG-RAN) algorithm, an online iterative synthesis procedure that utilizes discrete time-series data from a single trajectory for regulating system states and identifying disturbance dynamics. The effectiveness of our approach is demonstrated on a 6-DOF power descent guidance problem in the presence of adverse environmental disturbances.

17.8SYMar 31
Receding-Horizon Policy Gradient for Polytopic Controller Synthesis

Shiva Shakeri, Péter Baranyi, Mehran Mesbahi

We propose the Polytopic Receding-Horizon Policy Gradient (P-RHPG) algorithm for synthesizing Parallel Distributed Compensation (PDC) controllers via Tensor Product (TP) model transformation. Standard LMI-based PDC synthesis grows increasingly conservative as model fidelity improves; P-RHPG instead solves a finite-horizon integrated cost via backward-stage decomposition. The key result is that each stage subproblem is a strongly convex quadratic in the vertex gains, a consequence of the linear independence of the HOSVD weighting functions, guaranteeing a unique global minimizer and linear convergence of gradient descent from any initialization. With zero terminal cost, the optimal cost increases monotonically to a finite limit and the gain sequence remains bounded; terminal costs satisfying a mild Lyapunov condition yield non-increasing convergence. Experiments on an aeroelastic wing benchmark confirm convergence to a unique infinite-horizon optimum across all tested terminal cost choices and near-optimal performance relative to the pointwise Riccati lower bound.

MADec 20, 2024
Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems

Joshua Holder, Natasha Jaques, Mehran Mesbahi

Assignment problems are a classic combinatorial optimization problem in which a group of agents must be assigned to a group of tasks such that maximum utility is achieved while satisfying assignment constraints. Given the utility of each agent completing each task, polynomial-time algorithms exist to solve a single assignment problem in its simplest form. However, in many modern-day applications such as satellite constellations, power grids, and mobile robot scheduling, assignment problems unfold over time, with the utility for a given assignment depending heavily on the state of the system. We apply multi-agent reinforcement learning to this problem, learning the value of assignments by bootstrapping from a known polynomial-time greedy solver and then learning from further experience. We then choose assignments using a distributed optimal assignment mechanism rather than by selecting them directly. We demonstrate that this algorithm is theoretically justified and avoids pitfalls experienced by other RL algorithms in this setting. Finally, we show that our algorithm significantly outperforms other methods in the literature, even while scaling to realistic scenarios with hundreds of agents and tasks.

LGJul 21, 2020
Adaptive Traffic Control with Deep Reinforcement Learning: Towards State-of-the-art and Beyond

Siavash Alemzadeh, Ramin Moslemi, Ratnesh Sharma et al.

In this work, we study adaptive data-guided traffic planning and control using Reinforcement Learning (RL). We shift from the plain use of classic methods towards state-of-the-art in deep RL community. We embed several recent techniques in our algorithm that improve the original Deep Q-Networks (DQN) for discrete control and discuss the traffic-related interpretations that follow. We propose a novel DQN-based algorithm for Traffic Control (called TC-DQN+) as a tool for fast and more reliable traffic decision-making. We introduce a new form of reward function which is further discussed using illustrative examples with comparisons to traditional traffic control methods.

SPJul 12, 2020
Deep Learning-based Resource Allocation for Infrastructure Resilience

Siavash Alemzadeh, Hesam Talebiyan, Shahriar Talebi et al.

From an optimization point of view, resource allocation is one of the cornerstones of research for addressing limiting factors commonly arising in applications such as power outages and traffic jams. In this paper, we take a data-driven approach to estimate an optimal nodal restoration sequence for immediate recovery of the infrastructure networks after natural disasters such as earthquakes. We generate data from td-INDP, a high-fidelity simulator of optimal restoration strategies for interdependent networks, and employ deep neural networks to approximate those strategies. Despite the fact that the underlying problem is NP-complete, the restoration sequences obtained by our method are observed to be nearly optimal. In addition, by training multiple models---the so-called estimators---for a variety of resource availability levels, our proposed method balances a trade-off between resource utilization and restoration time. Decision-makers can use our trained models to allocate resources more efficiently after contingencies, and in turn, improve the community resilience. Besides their predictive power, such trained estimators unravel the effect of interdependencies among different nodal functionalities in the restoration strategies. We showcase our methodology by the real-world interdependent infrastructure of Shelby County, TN.

SYMay 29, 2020
On Regularizability and its Application to Online Control of Unstable LTI Systems

Shahriar Talebi, Siavash Alemzadeh, Niyousha Rahimi et al.

Learning, say through direct policy updates, often requires assumptions such as knowing a priori that the initial policy (gain) is stabilizing, or persistently exciting (PE) input-output data, is available. In this paper, we examine online regulation of (possibly unstable) partially unknown linear systems with no prior access to an initial stabilizing controller nor PE input-output data; we instead leverage the knowledge of the input matrix for online regulation. First, we introduce and characterize the notion of "regularizability" for linear systems that gauges the extent by which a system can be regulated in finite-time in contrast to its asymptotic behavior (commonly characterized by stabilizability/controllability). Next, having access only to the input matrix, we propose the Data-Guided Regulation (DGR) synthesis procedure that -- as its name suggests -- regulates the underlying state while also generating informative data that can subsequently be used for data-driven stabilization or system identification. We further improve the computational performance of DGR via a rank-one update and demonstrate its utility in online regulation of the X-29 aircraft.

SYApr 17, 2019
On Topological Properties of the Set of Stabilizing Feedback Gains

Jingjing Bu, Afshin Mesbahi, Mehran Mesbahi

This work presents a fairly complete account on various topological and metrical aspects of feedback stabilization for single-input-single-output (SISO) continuous and discrete time linear-time-invariant (LTI) systems. In particular, we prove that the set of stabilizing output feedback gains for a SISO system with n states has at most $\lceil{\frac{n}{2}}\rceil$ connected components. Furthermore, our analysis yields an algorithm for determining intervals of stabilizing gains for general continuous and discrete LIT systems; the proposed algorithm also computes the number of unstable roots in each unstable interval. Along the way, we also make a number of observations on the set of stabilizing state feedback gains for MIMO systems.

SYApr 4, 2019
On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case

Jingjing Bu, Afshin Mesbahi, Mehran Mesbahi

In this paper, we discuss various topological and metrical aspects of the set of stabilizing static feedback gains for multiple-input-multiple-output (MIMO) linear-time-invariant (LTI) systems, in both continuous and discrete-time. Recently, connectivity properties of this set (for continuous time) have been reported in the literature, along with a discussion on how this connectivity is affected by restricting the feedback gain to linear subspaces. We show that analogous to the continuous-time case, one can construct instances where the set of stabilizing feedback gains for discrete time LTI systems has exponentially many connected components.

LGJan 15, 2018
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

Maryam Fazel, Rong Ge, Sham M. Kakade et al.

Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies. A notable drawback is that even in the most basic continuous control problem (that of linear quadratic regulators), these methods must solve a non-convex optimization problem, where little is understood about their efficiency from both computational and statistical perspectives. In contrast, system identification and model based planning in optimal control theory have a much more solid theoretical footing, where much is known with regards to their computational and statistical properties. This work bridges this gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.

SYAug 2, 2017
Optimal Control with Limited Sensing via Empirical Gramians and Piecewise Linear Feedback

Atiye Alaeddini, Kristi A. Morgansen, Mehran Mesbahi

This paper is concerned with the design of optimal control for finite-dimensional control-affine nonlinear dynamical systems. We introduce an optimal control problem that specifically optimizes nonlinear observability in addition to ensuring stability of the closed loop system. A recursive algorithm is then proposed to obtain an optimal state feedback controller to maximize the resulting non-quadratic cost functional. The main contribution of the paper is presenting a control synthesis procedure that provides closed loop asymptotic stability, on one hand, and empirical observability of the system, as a transient performance criteria, on the other.

OCDec 22, 2014
Online Distributed Optimization on Dynamic Networks

Saghar Hosseini, Airlie Chapman, Mehran Mesbahi

This paper presents a distributed optimization scheme over a network of agents in the presence of cost uncertainties and over switching communication topologies. Inspired by recent advances in distributed convex optimization, we propose a distributed algorithm based on a dual sub-gradient averaging. The objective of this algorithm is to minimize a cost function cooperatively. Furthermore, the algorithm changes the weights on the communication links in the network to adapt to varying reliability of neighboring agents. A convergence rate analysis as a function of the underlying network topology is then presented, followed by simulation results for representative classes of sensor networks.