SYOct 2, 2013
Approximate Optimal Trajectory Tracking for Continuous Time Nonlinear SystemsRushikesh Kamalapurkar, Huyen Dinh, Shubhendu Bhasin et al.
Approximate dynamic programming has been investigated and used as a method to approximately solve optimal regulation problems. However, the extension of this technique to optimal tracking problems for continuous time nonlinear systems has remained a non-trivial open problem. The control development in this paper guarantees ultimately bounded tracking of a desired trajectory, while also ensuring that the controller converges to an approximate optimal policy.
SYDec 10, 2015
Integral Concurrent Learning: Adaptive Control with Parameter Convergence without PE or State DerivativesAnup Parikh, Rushikesh Kamalapurkar, Warren E. Dixon
Concurrent learning is a recently developed adaptive update scheme that can be used to guarantee parameter convergence without requiring persistent excitation. However, this technique requires knowledge of state derivatives, which are usually not directly sensed and therefore must be estimated. A novel integral concurrent learning method is developed in this paper that removes the need to estimate state derivatives while maintaining parameter convergence properties. A Monte Carlo simulation illustrates improved robustness to noise compared to the traditional derivative formulation.
SYFeb 28, 2017
Model-based reinforcement learning in differential graphical gamesRushikesh Kamalapurkar, Justin R. Klotz, Patrick Walters et al.
This paper seeks to combine differential game theory with the actor-critic-identifier architecture to determine forward-in-time, approximate optimal controllers for formation tracking in multi-agent systems, where the agents have uncertain heterogeneous nonlinear dynamics. A continuous control strategy is proposed, using communication feedback from extended neighbors on a communication topology that has a spanning tree. A model-based reinforcement learning technique is developed to cooperatively control a group of agents to track a trajectory in a desired formation. Simulation results are presented to demonstrate the performance of the developed technique.
SYDec 28, 2019
On reduction of differential inclusions and Lyapunov stabilityRushikesh Kamalapurkar, Warren E. Dixon, Andrew R. Teel
In this paper, locally Lipschitz, regular functions are utilized to identify and remove infeasible directions from set-valued maps that define differential inclusions. The resulting reduced set-valued map is point-wise smaller (in the sense of set containment) than the original set-valued map. The corresponding reduced differential inclusion, defined by the reduced set-valued map, is utilized to develop a generalized notion of a derivative for locally Lipschitz candidate Lyapunov functions in the direction(s) of a set-valued map. The developed generalized derivative yields less conservative statements of Lyapunov stability theorems, invariance theorems, invariance-like results, and Matrosov theorems for differential inclusions. Included illustrative examples demonstrate the utility of the developed theory.
SYOct 28, 2017
Online Approximate Optimal Station Keeping of a Marine Craft in the Presence of a CurrentPatrick Walters, Rushikesh Kamalapurkar, Forrest Voight et al.
Online approximation of the optimal station keeping strategy for a fully actuated six degrees-of-freedom marine craft subject to an irrotational ocean current is considered. An approximate solution to the optimal control problem is obtained using an adaptive dynamic programming technique. The hydrodynamic drift dynamics of the dynamic model are assumed to be unknown; therefore, a concurrent learning-based system identifier is developed to identify the unknown model parameters. The identified model is used to implement an adaptive model-based reinforcement learning technique to estimate the unknown value function. The developed policy guarantees uniformly ultimately bounded convergence of the vehicle to the desired station and uniformly ultimately bounded convergence of the approximated policies to the optimal polices without the requirement of persistence of excitation. The developed strategy is validated using an autonomous underwater vehicle, where the three degrees-of-freedom in the horizontal plane are regulated. The experiments are conducted in a second-magnitude spring located in central Florida.
SYNov 21, 2018
Online inverse reinforcement learning for nonlinear systemsRyan Self, Michael Harlan, Rushikesh Kamalapurkar
This paper focuses on the development of an online inverse reinforcement learning (IRL) technique for a class of nonlinear systems. The developed approach utilizes observed state and input trajectories, and determines the unknown cost function and the unknown value function online. A parameter estimation technique is utilized to allow the developed IRL technique to determine the cost function weights in the presence of unknown dynamics. Simulation results are presented for a nonlinear system showing convergence of both unknown reward function weights and unknown dynamics.
SYJan 23, 2018
Inverse reinforcement learning in continuous time and spaceRushikesh Kamalapurkar
This paper develops a data-driven inverse reinforcement learning technique for a class of linear systems to estimate the cost function of an agent online, using input-output measurements. A simultaneous state and parameter estimator is utilized to facilitate output-feedback inverse reinforcement learning, and cost function estimation is achieved up to multiplication by a constant.
OCApr 20
A Dynamic Mode Decomposition Approach to Parameter IdentificationMoad Abudia, Opeyemi Owolabi, Joel A. Rosenfeld et al.
This paper presents a data-driven algorithm for simultaneous system identification and parameter estimation in control-affine nonlinear systems. Parameter estimation is achieved by training a data-driven predictive model using state-action measurements and various known values at the parameters of interest. The predictive model is then used in conjunction with state-action data corresponding to unknown values of the parameters to estimate the said unknown value. Numerical experiments on the controlled Duffing oscillator with unknown damping, stiffness, and nonlinearity coefficients demonstrate accurate recovery of both the system trajectories and the unknown parameter values from data collected under open-loop excitation.
SYOct 28, 2022
Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement LearningJared Town, Zachary Morrison, Rushikesh Kamalapurkar
A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
SYJul 10, 2019
Output-feedback online optimal control for a class of nonlinear systemsRyan Self, Michael Harlan, Rushikesh Kamalapurkar
In this paper an output-feedback model-based reinforcement learning (MBRL) method for a class of second-order nonlinear systems is developed. The control technique uses exact model knowledge and integrates a dynamic state estimator within the model-based reinforcement learning framework to achieve output-feedback MBRL. Simulation results demonstrate the efficacy of the developed method.
OCApr 7
Adaptive Control with Sparse Identification of Nonlinear DynamicsTrivikram Satharasi, Tochukwu E. Ogri, Muzaffar Qureshi et al.
This paper develops a sparsity-promoting integral concurrent learning (SP-ICL) adaptation law for a linearly parametrized uncertain nonlinear control-affine system. The unknown parameters are learned using ICL with sparsity-promoting $\ell_1$ regularization. The use of $\ell_1$ regularization for sparsity promotion is common in system identification and machine learning; however, unlike existing approaches, this paper develops an online parameter update law that integrates the regularization penalty with ICL via sliding modes. Using the SP-ICL update law, we show via non-smooth Lyapunov analysis that the trajectories of the closed-loop system are ultimately bounded. Simulations verify the effectiveness of the sparsity penalty in the SP-ICL update law on recovering sparse dynamics during trajectory tracking.
ROMar 23
Parallel OctoMapping: A Scalable Framework for Enhanced Path Planning in Autonomous NavigationYihui Mao, Tian Tan, Xuehui Shen et al.
Mapping is essential in robotics and autonomous systems because it provides the spatial foundation for path planning. Efficient mapping enables planning algorithms to generate reliable paths while ensuring safety and adapting in real time to complex environments. Fixed-resolution mapping methods often produce overly conservative obstacle representations that lead to suboptimal paths or planning failures in cluttered scenes. To address this issue, we introduce Parallel OctoMapping (POMP), an efficient OctoMap-based mapping technique that maximizes available free space and supports multi-threaded computation. To the best of our knowledge, POMP is the first method that, at a fixed occupancy-grid resolution, refines the representation of free space while preserving map fidelity and compatibility with existing search-based planners. It can therefore be integrated into existing planning pipelines, yielding higher pathfinding success rates and shorter path lengths, especially in cluttered environments, while substantially improving computational efficiency.
LGMar 29
Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended VersionMasoud S. Sakha, Rushikesh Kamalapurkar, Sean Meyn
Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particular, when the baseline is chosen as the empirical distribution of the state-action process, the algorithm is stable for any non-negative baseline weight and any discount factor. We also provide a sensitivity analysis of the resulting parameter estimates, characterizing both asymptotic bias and covariance. The asymptotic covariance and asymptotic bias are shown to remain uniformly bounded as the discount factor approaches one.
MLMar 20, 2023
Fault Detection via Occupation Kernel Principal Component AnalysisZachary Morrison, Benjamin P. Russo, Yingzhao Lian et al.
The reliable operation of automatic systems is heavily dependent on the ability to detect faults in the underlying dynamical system. While traditional model-based methods have been widely used for fault detection, data-driven approaches have garnered increasing attention due to their ease of deployment and minimal need for expert knowledge. In this paper, we present a novel principal component analysis (PCA) method that uses occupation kernels. Occupation kernels result in feature maps that are tailored to the measured data, have inherent noise-robustness due to the use of integration, and can utilize irregularly sampled system trajectories of variable lengths for PCA. The occupation kernel PCA method is used to develop a reconstruction error approach to fault detection and its efficacy is validated using numerical simulations.
SYApr 8
Decentralized Scalar Field Mapping using Gaussian ProcessHossein Papi, Muzaffar Qureshi, Kyle Volle et al.
Decentralized Gaussian process (GP) methods offer a scalable framework for multi-agent scalar-field estimation by replacing a centralized global model with multiple local models maintained by individual agents. A team of agents operates through overlapping domains; neighboring agents generally produce inconsistent distributions over shared regions. This paper investigates whether these inter-agent posterior discrepancies can be systematically exploited to improve team-level predictive performance and answers this question positively through a novel decentralized intersection data-sharing and assimilation protocol. Specifically, each agent constructs neighbor-specific packets from its local GP together with the geometry of the overlap between subdomains and selectively assimilates information received from neighboring agents to improve consistency of its posterior over the shared regions. The proposed architecture preserves locality in both computation and communication, supports decentralized neighbor-to-neighbor data assimilation, and allows local GP models to evolve cooperatively across the network without requiring the exchange full packet exchange or centralized inference.
ROApr 22
A Hough transform approach to safety-aware scalar field mapping using Gaussian ProcessesMuzaffar Qureshi, Trivikram Satharasi, Tochukwu E. Ogri et al.
This paper presents a framework for mapping unknown scalar fields using a sensor-equipped autonomous robot operating in unsafe environments. The unsafe regions are defined as regions of high-intensity, where the field value exceeds a predefined safety threshold. For safe and efficient mapping of the scalar field, the sensor-equipped robot must avoid high-intensity regions during the measurement process. In this paper, the scalar field is modeled as a sample from a Gaussian process (GP), which enables Bayesian inference and provides closed-form expressions for both the predictive mean and the uncertainty. Concurrently, the spatial structure of the high-intensity regions is estimated in real-time using the Hough transform (HT), leveraging the evolving GP posterior. A safe sampling strategy is then employed to guide the robot towards safe measurement locations, using probabilistic safety guarantees on the evolving GP posterior. The estimated high-intensity regions also facilitate the design of safe motion plans for the robot. The effectiveness of the approach is verified through two numerical simulation studies and an indoor experiment for mapping a light-intensity field using a wheeled mobile robot.
SYJun 6, 2021
Singular Dynamic Mode DecompositionsJoel A. Rosenfeld, Rushikesh Kamalapurkar
This manuscript is aimed at addressing several long standing limitations of dynamic mode decompositions in the application of Koopman analysis. Principle among these limitations are the convergence of associated Dynamic Mode Decomposition algorithms and the existence of Koopman modes. To address these limitations, two major modifications are made, where Koopman operators are removed from the analysis in light of Liouville operators (known as Koopman generators in special cases), and these operators are shown to be compact for certain pairs of Hilbert spaces selected separately as the domain and range of the operator. While eigenfunctions are discarded in the general analysis, a viable reconstruction algorithm is still demonstrated, and the sacrifice of eigenfunctions realizes the theoretical goals of DMD analysis that have yet to be achieved in other contexts. However, in the case where the domain is embedded in the range, an eigenfunction approach is still achievable, where a more typical DMD routine is established, but that leverages a finite rank representation that converges in norm. The manuscript concludes with the description of two Dynamic Mode Decomposition algorithms that converges when a dense collection of occupation kernels, arising from the data, are leveraged in the analysis.
FAMay 31, 2021
The kernel perspective on dynamic mode decompositionEfrain Gonzalez, Moad Abudia, Michael Jury et al.
This manuscript revisits theoretical assumptions concerning dynamic mode decomposition (DMD) of Koopman operators, including the existence of lattices of eigenfunctions, common eigenfunctions between Koopman operators, and boundedness and compactness of Koopman operators. Counterexamples that illustrate restrictiveness of the assumptions are provided for each of the assumptions. In particular, this manuscript proves that the native reproducing kernel Hilbert space (RKHS) of the Gaussian RBF kernel function only supports bounded Koopman operators if the dynamics are affine. In addition, a new framework for DMD, that requires only densely defined Koopman operators over RKHSs is introduced, and its effectiveness is demonstrated through numerical examples.
OCMay 31, 2021
Control Occupation Kernel Regression for Nonlinear Control-Affine SystemsMoad Abudia, Tejasvi Channagiri, Joel A. Rosenfeld et al.
This manuscript presents an algorithm for obtaining an approximation of a nonlinear high order control affine dynamical system. Controlled trajectories of the system are leveraged as the central unit of information via embedding them in vector-valued reproducing kernel Hilbert space (vvRKHS). The trajectories are embedded as the so-called higher order control occupation kernels which represent an operator on the vvRKHS corresponding to iterated integration after multiplication by a given controller. The solution to the system identification problem is then the unique solution of an infinite dimensional regularized regression problem. The representer theorem is then used to express the solution as finite linear combination of these occupation kernels, which converts an infinite dimensional optimization problem to a finite dimensional optimization problem. The vector valued structure of the Hilbert space allows for simultaneous approximation of the drift and control effectiveness components of the control affine system. Several experiments are performed to demonstrate the effectiveness of the developed approach.
SYNov 3, 2020
Online Observer-Based Inverse Reinforcement LearningRyan Self, Kevin Coleman, He Bai et al.
In this paper, a novel approach to the output-feedback inverse reinforcement learning (IRL) problem is developed by casting the IRL problem, for linear systems with quadratic cost functions, as a state estimation problem. Two observer-based techniques for IRL are developed, including a novel observer method that re-uses previous state estimates via history stacks. Theoretical guarantees for convergence and robustness are established under appropriate excitation conditions. Simulations demonstrate the performance of the developed observers and filters under noisy and noise-free measurements.
SYAug 11, 2020
Extension of Full and Reduced Order Observers for Image-based Depth Estimation using Concurrent LearningGhananeel Rotithor, Daniel Trombetta, Rushikesh Kamalapurkar et al.
In this paper concurrent learning (CL)-based full and reduced order observers for a perspective dynamical system (PDS) are developed. The PDS is a widely used model for estimating the depth of a feature point from a sequence of camera images. Building on the current progress of CL for parameter estimation in adaptive control, a state observer is developed for the PDS model where the inverse depth appears as a time-varying parameter in the dynamics. The data recorded over a sliding time window in the near past is used in the CL term to design the full and the reduced order state observers. A Lyapunov-based stability analysis is carried out to prove the uniformly ultimately bounded (UUB) stability of the developed observers. Simulation results are presented to validate the accuracy and convergence of the developed observers in terms of convergence time, root mean square error (RMSE) and mean absolute percentage error (MAPE) metrics. Real world depth estimation experiments are performed to demonstrate the performance of the observers using aforementioned metrics on a 7-DoF manipulator with an eye-in-hand configuration.
SYAug 29, 2017
Invariance-like results for Nonautonomous Switched SystemsRushikesh Kamalapurkar, Joel A. Rosenfeld, Anup Parikh et al.
This paper generalizes the Lasalle-Yoshizawa Theorem to switched nonsmooth systems. Filippov and Krasovskii regularizations of a switched system are shown to be contained within the convex hull of the Filippov and Krasovskii regularizations of the subsystems, respectively. A candidate common Lyapunov function that has a negative semidefinite derivative along the trajectories of the subsystems is shown to be sufficient to establish LaSalle-Yoshizawa results for the switched system. Results for regular and non-regular candidate Lyapunov functions are presented using an appropriate generalization of the time derivative. The developed generalization is motivated by adaptive control of switched systems where the derivative of the candidate Lyapunov function is typically negative semidefinite.
SYSep 19, 2016
Online Output-Feedback Parameter and State Estimation for Second Order Linear SystemsRushikesh Kamalapurkar
In this paper, a concurrent learning based adaptive observer is developed for a class of second-order linear time-invariant systems with uncertain system matrices. The developed technique yields an exponentially convergent state estimator and an exponentially convergent parameter estimator. As opposed to persistent excitation required for parameter convergence in traditional adaptive methods, excitation over a finite time-interval is sufficient for the developed technique to achieve exponential convergence. Simulation results in both noise-free and noisy environments are presented to validate the design.
SYFeb 9, 2015
Efficient model-based reinforcement learning for approximate online optimalRushikesh Kamalapurkar, Joel A. Rosenfeld, Warren E. Dixon
In this paper the infinite horizon optimal regulation problem is solved online for a deterministic control-affine nonlinear dynamical system using the state following (StaF) kernel method to approximate the value function. Unlike traditional methods that aim to approximate a function over a large compact set, the StaF kernel method aims to approximate a function in a small neighborhood of a state that travels within a compact set. Simulation results demonstrate that stability and approximate optimality of the control system can be achieved with significantly fewer basis functions than may be required for global approximation methods.