Shaoshuai Mou

RO
h-index6
27papers
431citations
Novelty53%
AI Score55

27 Papers

SYMar 3, 2015
Undirected Rigid Formations are Problematic

Shaoshuai Mou, A. Stephen Morse, Mohamed Ali Belabbas et al.

By an undirected rigid formation of mobile autonomous agents is meant a formation based on graph rigidity in which each pair of "neighboring" agents is responsible for maintaining a prescribed target distance between them. In a recent paper a systematic method was proposed for devising gradient control laws for asymptotically stabilizing a large class of rigid, undirected formations in two dimensional space assuming all agents are described by kinematic point models. The aim of this paper is to explain what happens to such formations if neighboring agents have slightly different understandings of what the desired distance between them is supposed to be or equivalently if neighboring agents have differing estimates of what the actual distance between them is. In either case, what one would expect would be a gradual distortion of the formation from its target shape as discrepancies in desired or sensed distances increase. While this is observed for the gradient laws in question, something else quite unexpected happens at the same time. It is shown that for any rigidity-based, undirected formation of this type which is comprised of three or more agents, that if some neighboring agents have slightly different understandings of what the desired distances between them are suppose to be, then almost for certain, the trajectory of the resulting distorted but rigid formation will converge exponentially fast to a closed circular orbit in two-dimensional space which is traversed periodically at a constant angular speed.

SYMar 3, 2015
A Distributed Algorithm for Solving a Linear Algebraic Equation

Shaoshuai Mou, Ji Liu, A. Stephen Morse

A distributed algorithm is described for solving a linear algebraic equation of the form $Ax=b$ assuming the equation has at least one solution. The equation is simultaneously solved by $m$ agents assuming each agent knows only a subset of the rows of the partitioned matrix $(A,b)$, the current estimates of the equation's solution generated by its neighbors, and nothing more. Each agent recursively updates its estimate by utilizing the current estimates generated by each of its neighbors. Neighbor relations are characterized by a time-dependent directed graph $\mathbb{N}(t)$ whose vertices correspond to agents and whose arcs depict neighbor relations. It is shown that for any matrix $A$ for which the equation has a solution and any sequence of "repeatedly jointly strongly connected graphs" $\mathbb{N}(t)$, $t=1,2,\ldots$, the algorithm causes all agents' estimates to converge exponentially fast to the same solution to $Ax=b$. It is also shown that the neighbor graph sequence must actually be repeatedly jointly strongly connected if exponential convergence is to be assured. A worst case convergence rate bound is derived for the case when $Ax=b$ has a unique solution. It is demonstrated that with minor modification, the algorithm can track the solution to $Ax = b$, even if $A$ and $b$ are changing with time, provided the rates of change of $A$ and $b$ are sufficiently small. It is also shown that in the absence of communication delays, exponential convergence to a solution occurs even if the times at which each agent updates its estimates are not synchronized with the update times of its neighbors. A modification of the algorithm is outlined which enables it to obtain a least squares solution to $Ax=b$ in a distributed manner, even if $Ax=b$ does not have a solution.

SYSep 28, 2017
Finite-Time Distributed Linear Equation Solver for Minimum $l_1$ Norm Solutions

Jingqiu Zhou, Wang Xuan, Shaoshuai Mou et al.

This paper proposes distributed algorithms for multi-agent networks to achieve a solution in finite time to a linear equation $Ax=b$ where $A$ has full row rank, and with the minimum $l_1$-norm in the underdetermined case (where $A$ has more columns than rows). The underlying network is assumed to be undirected and fixed, and an analytical proof is provided for the proposed algorithm to drive all agents' individual states to converge to a common value, viz a solution of $Ax=b$, which is the minimum $l_1$-norm solution in the underdetermined case. Numerical simulations are also provided as validation of the proposed algorithms.

OCJul 10, 2018
A Resilient Convex Combination for consensus-based distributed algorithms

Xuan Wang, Shaoshuai Mou, Shreyas Sundaram

Consider a set of vectors in $\mathbb{R}^n$, partitioned into two classes: normal vectors and malicious vectors. The number of malicious vectors is bounded but their identities are unknown. The paper provides a way for achieving a resilient convex combination, which is a convex combination of only normal vectors. Compared with existing approaches based on Tverberg points, the proposed method based on the intersection of convex hulls has lower computational complexity. Simulations suggest that the proposed method can be applied to resilience for consensus-based distributed algorithms against Byzantine attacks.

SYNov 29, 2017
A Double-Layered Framework for Distributed Coordination in Solving Linear Equations

Xuan Wang, Shaoshuai Mou, Brian. D. O. Anderson

This paper proposes a double-layered framework (or form of network) to integrate two mechanisms, termed consensus and conservation, achieving distributed solution of a linear equation. The multi-agent framework considered in the paper is composed of clusters (which serve as a form of aggregating agent) and each cluster consists of a sub-network of agents. By achieving consensus and conservation through agent-agent communications in the same cluster and cluster-cluster communications, distributed algorithms are devised for agents to cooperatively achieve a solution to the overall linear equation. These algorithms outperform existing consensus-based algorithms, including but not limited to the following aspects: first, each agent does not have to know as much as a complete row or column of the overall equation; second, each agent only needs to control as few as two scalar states when the number of clusters and the number of agents are sufficiently large; third, the dimensions of agents' states in the proposed algorithms do not have to be the same (while in contrast, algorithms based on the idea of standard consensus inherently require all agents' states to be of the same dimension). Both analytical proof and simulation results are provided to validate exponential convergence of the proposed distributed algorithms in solving linear equations.

SYSep 28, 2017
A Distributed Algorithm for Least Square Solutions of Linear Equations

Xuan Wang, Jingqiu Zhou, Shaoshuai Mou et al.

A distributed discrete-time algorithm is proposed for multi-agent networks to achieve a common least squares solution of a group of linear equations, in which each agent only knows some of the equations and is only able to receive information from its nearby neighbors. For fixed, connected, and undirected networks, the proposed discrete-time algorithm results in each agents solution estimate to converging exponentially fast to the same least squares solution. Moreover, the convergence does not require careful choices of time-varying small step sizes.

SYAug 14, 2018
On the stability and applications of distance-based flexible formations

Hector Garcia de Marina, Zhiyong Sun, Shaoshuai Mou

This paper investigates the stability of distance-based \textit{flexible} undirected formations in the plane. Without rigidity, there exists a set of connected shapes for given distance constraints, which is called the ambit. We show that a flexible formation can lose its flexibility, or equivalently may reduce the degrees of freedom of its ambit, if a small disturbance is introduced in the range sensor of the agents. The stability of the disturbed equilibrium can be characterized by analyzing the eigenvalues of the linearized augmented error system. Unlike infinitesimally rigid formations, the disturbed desired equilibrium can be turned unstable regardless of how small the disturbance is. We finally present two examples of how to exploit these disturbances as design parameters. The first example shows how to combine rigid and flexible formations such that some of the agents can move freely in the desired and locally stable ambit. The second example shows how to achieve a specific shape with fewer edges than the necessary for the standard controller in rigid formations.

67.7SYMar 11
Distributed Koopman Learning using Partial Trajectories for Control

Wenjian Hao, Zehui Lu, Devesh Upadhyay et al.

This paper proposes a distributed data-driven framework for dynamics learning, termed distributed deep Koopman learning using partial trajectories (DDKL-PT). In this framework, each agent in a multi-agent system is assigned a partial trajectory offline and locally approximates the unknown dynamics using a deep neural network within the Koopman operator framework. By exchanging local estimated dynamics rather than training data, agents achieve consensus on a global dynamics model without sharing their private training trajectories. Simulation studies on a surface vehicle demonstrate that DDKL-PT achieves consensus on the learned dynamics, and each agent attains reasonably small approximation errors on the testing dataset. Furthermore, a model predictive control scheme is developed by integrating the learned Koopman dynamics with known kinematic relations. Results on a reference-tracking task indicate that the distributedly learned dynamics are sufficiently accurate for model-based optimal control.

39.0ROApr 21
Efficient Reinforcement Learning using Linear Koopman Dynamics for Nonlinear Robotic Systems

Wenjian Hao, Yuxuan Fang, Zehui Lu et al.

This paper presents a model-based reinforcement learning (RL) framework for optimal closed-loop control of nonlinear robotic systems. The proposed approach learns linear lifted dynamics through Koopman operator theory and integrates the resulting model into an actor-critic architecture for policy optimization, where the policy represents a parameterized closed-loop controller. To reduce computational cost and mitigate model rollout errors, policy gradients are estimated using one-step predictions of the learned dynamics rather than multi-step propagation. This leads to an online mini-batch policy gradient framework that enables policy improvement from streamed interaction data. The proposed framework is evaluated on several simulated nonlinear control benchmarks and two real-world hardware platforms, including a Kinova Gen3 robotic arm and a Unitree Go1 quadruped. Experimental results demonstrate improved sample efficiency over model-free RL baselines, superior control performance relative to model-based RL baselines, and control performance comparable to classical model-based methods that rely on exact system dynamics.

ROSep 13, 2024
HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit

Yang Li, Dengyu Zhang, Junfan Chen et al.

Zero-shot coordination (ZSC) is a significant challenge in multi-agent collaboration, aiming to develop agents that can coordinate with unseen partners they have not encountered before. Recent cutting-edge ZSC methods have primarily focused on two-player video games such as OverCooked!2 and Hanabi. In this paper, we extend the scope of ZSC research to the multi-drone cooperative pursuit scenario, exploring how to construct a drone agent capable of coordinating with multiple unseen partners to capture multiple evaders. We propose a novel Hypergraphic Open-ended Learning Algorithm (HOLA-Drone) that continuously adapts the learning objective based on our hypergraphic-form game modeling, aiming to improve cooperative abilities with multiple unknown drone teammates. To empirically verify the effectiveness of HOLA-Drone, we build two different unseen drone teammate pools to evaluate their performance in coordination with various unseen partners. The experimental results demonstrate that HOLA-Drone outperforms the baseline methods in coordination with unseen drone teammates. Furthermore, real-world experiments validate the feasibility of HOLA-Drone in physical systems. Videos can be found on the project homepage~\url{https://sites.google.com/view/hola-drone}.

30.6ROApr 10
Online Intention Prediction via Control-Informed Learning

Tianyu Zhou, Zihao Liang, Zehui Lu et al.

This paper presents an online intention prediction framework for estimating the goal state of autonomous systems in real time, even when intention is time-varying, and system dynamics or objectives include unknown parameters. The problem is formulated as an inverse optimal control / inverse reinforcement learning task, with the intention treated as a parameter in the objective. A shifting horizon strategy discounts outdated information, while online control-informed learning enables efficient gradient computation and online parameter updates. Simulations under varying noise levels and hardware experiments on a quadrotor drone demonstrate that the proposed approach achieves accurate, adaptive intention prediction in complex environments.

LGDec 7, 2023
Distributed Optimization via Kernelized Multi-armed Bandits

Ayush Rai, Shaoshuai Mou

Multi-armed bandit algorithms provide solutions for sequential decision-making where learning takes place by interacting with the environment. In this work, we model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting. In this setup, the agents collaboratively aim to maximize a global objective function which is an average of local objective functions. The agents can access only bandit feedback (noisy reward) obtained from the associated unknown local function with a small norm in reproducing kernel Hilbert space (RKHS). We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels while preserving privacy. It does not necessitate the agents to share their actions, rewards, or estimates of their local function. In the proposed approach, the agents sample their individual local functions in a way that benefits the whole network by utilizing a running consensus to estimate the upper confidence bound on the global function. Furthermore, we propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network. It provides improved performance by utilizing a delay in the estimation update step at the cost of more communication.

SYDec 15, 2025
Safe Online Control-Informed Learning

Tianyu Zhou, Zihao Liang, Zehui Lu et al.

This paper proposes a Safe Online Control-Informed Learning framework for safety-critical autonomous systems. The framework unifies optimal control, parameter estimation, and safety constraints into an online learning process. It employs an extended Kalman filter to incrementally update system parameters in real time, enabling robust and data-efficient adaptation under uncertainty. A softplus barrier function enforces constraint satisfaction during learning and control while eliminating the dependence on high-quality initial guesses. Theoretical analysis establishes convergence and safety guarantees, and the framework's effectiveness is demonstrated on cart-pole and robot-arm systems.

SYSep 16, 2025
Deep Koopman Learning using Noisy Data

Wenjian Hao, Devesh Upadhyay, Shaoshuai Mou

This paper proposes a data-driven framework to learn a finite-dimensional approximation of a Koopman operator for approximating the state evolution of a dynamical system under noisy observations. To this end, our proposed solution has two main advantages. First, the proposed method only requires the measurement noise to be bounded. Second, the proposed method modifies the existing deep Koopman operator formulations by characterizing the effect of the measurement noise on the Koopman operator learning and then mitigating it by updating the tunable parameter of the observable functions of the Koopman operator, making it easy to implement. The performance of the proposed method is demonstrated on several standard benchmarks. We then compare the presented method with similar methods proposed in the latest literature on Koopman learning.

SYJul 26, 2025
Deep Koopman Learning of Nonlinear Time-Varying Systems

Wenjian Hao, Bowen Huang, Wei Pan et al.

This paper presents a data-driven approach to approximate the dynamics of a nonlinear time-varying system (NTVS) by a linear time-varying system (LTVS), which is resulted from the Koopman operator and deep neural networks. Analysis of the approximation error between states of the NTVS and the resulting LTVS is presented. Simulations on a representative NTVS show that the proposed method achieves small approximation errors, even when the system changes rapidly. Furthermore, simulations in an example of quadcopters demonstrate the computational efficiency of the proposed approach.

CVMar 24, 2025
Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection

Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou et al.

Out-of-distribution (OOD) detection is the task of identifying inputs that deviate from the training data distribution. This capability is essential for safely deploying deep computer vision models in open-world environments. In this work, we propose a post-hoc method, Perturbation-Rectified OOD detection (PRO), based on the insight that prediction confidence for OOD inputs is more susceptible to reduction under perturbation than in-distribution (IND) inputs. Based on the observation, we propose an adversarial score function that searches for the local minimum scores near the original inputs by applying gradient descent. This procedure enhances the separability between IND and OOD samples. Importantly, the approach improves OOD detection performance without complex modifications to the underlying model architectures. We conduct extensive experiments using the OpenOOD benchmark~\cite{yang2022openood}. Our approach further pushes the limit of softmax-based OOD detection and is the leading post-hoc method for small-scale models. On a CIFAR-10 model with adversarial training, PRO effectively detects near-OOD inputs, achieving a reduction of more than 10\% on FPR@95 compared to state-of-the-art methods.

LGMay 24, 2023
Adaptive Policy Learning to Additional Tasks

Wenjian Hao, Zehui Lu, Zihao Liang et al.

This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $\mathcal{O}(1/T)$ and $\mathcal{O}(1/ε)$, respectively, where $T$ denotes the number of iterations and $ε$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.

LGMay 24, 2023
Optimal Control of Nonlinear Systems with Unknown Dynamics

Wenjian Hao, Paulo C. Heredia, Shaoshuai Mou

This paper presents a data-driven method to find a closed-loop optimal controller, which minimizes a specified infinite-horizon cost function for systems with unknown dynamics. Suppose the closed-loop optimal controller can be parameterized by a given class of functions, hereafter referred to as the policy. The proposed method introduces a novel gradient estimation framework, which approximates the gradient of the cost function with respect to the policy parameters via integrating the Koopman operator with the classical concept of actor-critic. This enables the policy parameters to be tuned iteratively using gradient descent to achieve an optimal controller, leveraging the linearity of the Koopman operator. The convergence analysis of the proposed framework is provided. The control performance of the proposed method is evaluated through simulations compared with classical optimal control methods that usually assume the dynamics are known.

LGMay 31, 2021
Safe Pontryagin Differentiable Programming

Wanxin Jin, Shaoshuai Mou, George J. Pappas

We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of safety constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of system constraints on states and inputs by incorporating them into the cost or loss through barrier functions. We prove three fundamentals of the proposed Safe PDP: first, both the solution and its gradient in the backward pass can be approximated by solving their more efficient unconstrained counterparts; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy by a barrier parameter; and third, importantly, all intermediate results throughout the approximation and optimization strictly respect the constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safety-critical tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.

RONov 30, 2020
Learning from Human Directional Corrections

Wanxin Jin, Todd D. Murphey, Zehui Lu et al.

This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections -- corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.

ROOct 28, 2020
Learning Objective Functions Incrementally by Inverse Optimal Control

Zihao Liang, Wanxin Jin, Shaoshuai Mou

This paper proposes an inverse optimal control method which enables a robot to incrementally learn a control objective function from a collection of trajectory segments. By saying incrementally, it means that the collection of trajectory segments is enlarged because additional segments are provided as time evolves. The unknown objective function is parameterized as a weighted sum of features with unknown weights. Each trajectory segment is a small snippet of optimal trajectory. The proposed method shows that each trajectory segment, if informative, can pose a linear constraint to the unknown weights, thus, the objective function can be learned by incrementally incorporating all informative segments. Effectiveness of the method is shown on a simulated 2-link robot arm and a 6-DoF maneuvering quadrotor system, in each of which only small demonstration segments are available.

ROAug 5, 2020
Learning from Sparse Demonstrations

Wanxin Jin, Todd D. Murphey, Dana Kulić et al.

This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.

SYJun 15, 2020
Neural Certificates for Safe Control Policies

Wanxin Jin, Zhaoran Wang, Zhuoran Yang et al.

This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching. Here, the safety means that a policy must not drive the state of the system to any unsafe region, while the goal-reaching requires the trajectory of the controlled system asymptotically converges to a goal region (a generalization of stability). We obtain the safe and goal-reaching policy by jointly learning two additional certificate functions: a barrier function that guarantees the safety and a developed Lyapunov-like function to fulfill the goal-reaching requirement, both of which are represented by neural networks. We show the effectiveness of the method to learn both safe and goal-reaching policies on various systems, including pendulums, cart-poles, and UAVs.

LGDec 30, 2019
Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

Wanxin Jin, Zhaoran Wang, Zhuoran Yang et al.

This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin's Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system's trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multi-link robot arm, 6-DoF maneuvering quadrotor, and 6-DoF rocket powered landing.

CRDec 25, 2019
Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime Measures

Saurabh Bagchi, Vaneet Aggarwal, Somali Chaterji et al.

A set of about 80 researchers, practitioners, and federal agency program managers participated in the NSF-sponsored Grand Challenges in Resilience Workshop held on Purdue campus on March 19-21, 2019. The workshop was divided into three themes: resilience in cyber, cyber-physical, and socio-technical systems. About 30 attendees in all participated in the discussions of cyber resilience. This article brings out the substantive parts of the challenges and solution approaches that were identified in the cyber resilience theme. In this article, we put forward the substantial challenges in cyber resilience in a few representative application domains and outline foundational solutions to address these challenges. These solutions fall into two broad themes: resilience-by-design and resilience-by-reaction. We use examples of autonomous systems as the application drivers motivating cyber resilience. We focus on some autonomous systems in the near horizon (autonomous ground and aerial vehicles) and also a little more distant (autonomous rescue and relief). For resilience-by-design, we focus on design methods in software that are needed for our cyber systems to be resilient. In contrast, for resilience-by-reaction, we discuss how to make systems resilient by responding, reconfiguring, or recovering at runtime when failures happen. We also discuss the notion of adaptive execution to improve resilience, execution transparently and adaptively among available execution platforms (mobile/embedded, edge, and cloud). For each of the two themes, we survey the current state, and the desired state and ways to get there. We conclude the paper by looking at the research challenges we will have to solve in the short and the mid-term to make the vision of resilient autonomous systems a reality.

ROMar 21, 2018
Inverse Optimal Control from Incomplete Trajectory Observations

Wanxin Jin, Dana Kulić, Shaoshuai Mou et al.

This article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recovery matrix to establish the relationship between any available segment of the trajectory and the weights of given candidate features. The rank of the recovery matrix indicates whether a subset of relevant features can be found among the candidate features and the corresponding weights can be learned from the segment data. The recovery matrix can be obtained iteratively and its rank non-decreasing property shows that additional observations may contribute to the objective learning. Based on the recovery matrix, a method for using incomplete trajectory observations to learn the weights of selected features is established, and an incremental inverse optimal control algorithm is developed by automatically finding the minimal required observation. The effectiveness of the proposed method is demonstrated on a linear quadratic regulator system and a simulated robot manipulator.

SYSep 15, 2015
Decentralized gradient algorithm for solution of a linear equation

Brian D. O. Anderson, Shaoshuai Mou, A. Stephen Morse et al.

The paper develops a technique for solving a linear equation $Ax=b$ with a square and nonsingular matrix $A$, using a decentralized gradient algorithm. In the language of control theory, there are $n$ agents, each storing at time $t$ an $n$-vector, call it $x_i(t)$, and a graphical structure associating with each agent a vertex of a fixed, undirected and connected but otherwise arbitrary graph $\mathcal G$ with vertex set and edge set $\mathcal V$ and $\mathcal E$ respectively. We provide differential equation update laws for the $x_i$ with the property that each $x_i$ converges to the solution of the linear equation exponentially fast. The equation for $x_i$ includes additive terms weighting those $x_j$ for which vertices in $\mathcal G$ corresponding to the $i$-th and $j$-th agents are adjacent. The results are extended to the case where $A$ is not square but has full row rank, and bounds are given on the convergence rate.