Evangelos Theodorou

RO
h-index47
18papers
628citations
Novelty53%
AI Score41

18 Papers

SYJun 15, 2016
Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

Yunpeng Pan, Evangelos Theodorou

We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nyström approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks.

SYFeb 15, 2017
Pseudospectral Model Predictive Control under Partially Learned Dynamics

Manan Gandhi, Yunpeng Pan, Evangelos Theodorou

Trajectory optimization of a controlled dynamical system is an essential part of autonomy, however many trajectory optimization techniques are limited by the fidelity of the underlying parametric model. In the field of robotics, a lack of model knowledge can be overcome with machine learning techniques, utilizing measurements to build a dynamical model from the data. This paper aims to take the middle ground between these two approaches by introducing a semi-parametric representation of the underlying system dynamics. Our goal is to leverage the considerable information contained in a traditional physics based model and combine it with a data-driven, non-parametric regression technique known as a Gaussian Process. Integrating this semi-parametric model with model predictive pseudospectral control, we demonstrate this technique on both a cart pole and quadrotor simulation with unmodeled damping and parametric error. In order to manage parametric uncertainty, we introduce an algorithm that utilizes Sparse Spectrum Gaussian Processes (SSGP) for online learning after each rollout. We implement this online learning technique on a cart pole and quadrator, then demonstrate the use of online learning and obstacle avoidance for the dubin vehicle dynamics.

LGFeb 4
Generalized Schrödinger Bridge on Graphs

Panagiotis Theodoropoulos, Juno Nam, Evangelos Theodorou et al.

Transportation on graphs is a fundamental challenge across many domains, where decisions must respect topological and operational constraints. Despite the need for actionable policies, existing graph-transport methods lack this expressivity. They rely on restrictive assumptions, fail to generalize across sparse topologies, and scale poorly with graph size and time horizon. To address these issues, we introduce Generalized Schrödinger Bridge on Graphs (GSBoG), a novel scalable data-driven framework for learning executable controlled continuous-time Markov chain (CTMC) policies on arbitrary graphs under state cost augmented dynamics. Notably, GSBoG learns trajectory-level policies, avoiding dense global solvers and thereby enhancing scalability. This is achieved via a likelihood optimization approach, satisfying the endpoint marginals, while simultaneously optimizing intermediate behavior under state-dependent running costs. Extensive experimentation on challenging real-world graph topologies shows that GSBoG reliably learns accurate, topology-respecting policies while optimizing application-specific intermediate state costs, highlighting its broad applicability and paving new avenues for cost-aware dynamical transport on general graphs.

ROApr 23, 2025
Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving

Jacob Levy, Jason Gibson, Bogdan Vlahov et al.

High-speed off-road autonomous driving presents unique challenges due to complex, evolving terrain characteristics and the difficulty of accurately modeling terrain-vehicle interactions. While dynamics models used in model-based control can be learned from real-world data, they often struggle to generalize to unseen terrain, making real-time adaptation essential. We propose a novel framework that combines a Kalman filter-based online adaptation scheme with meta-learned parameters to address these challenges. Offline meta-learning optimizes the basis functions along which adaptation occurs, as well as the adaptation parameters, while online adaptation dynamically adjusts the onboard dynamics model in real time for model-based control. We validate our approach through extensive experiments, including real-world testing on a full-scale autonomous off-road vehicle, demonstrating that our method outperforms baseline approaches in prediction accuracy, performance, and safety metrics, particularly in safety-critical scenarios. Our results underscore the effectiveness of meta-learned dynamics model adaptation, advancing the development of reliable autonomous systems capable of navigating diverse and unseen environments. Video is available at: https://youtu.be/cCKHHrDRQEA

ROSep 24, 2021
Trajectory Distribution Control for Model Predictive Path Integral Control using Covariance Steering

Ji Yin, Zhiyuan Zhang, Evangelos Theodorou et al.

This paper presents a novel control approach for autonomous systems operating under uncertainty. We combine Model Predictive Path Integral (MPPI) control with Covariance Steering (CS) theory to obtain a robust controller for general nonlinear systems. The proposed Covariance-Controlled Model Predictive Path Integral (CC-MPPI) controller addresses the performance degradation observed in some MPPI implementations owing to unexpected disturbances and uncertainties. Namely, in cases where the environment changes too fast or the simulated dynamics during the MPPI rollouts do not capture the noise and uncertainty in the actual dynamics, the baseline MPPI implementation may lead to divergence. The proposed CC-MPPI controller avoids divergence by controlling the dispersion of the rollout trajectories at the end of the prediction horizon. Furthermore, the CC-MPPI has adjustable trajectory sampling distributions that can be changed according to the environment to achieve efficient sampling. Numerical examples using a ground vehicle navigating in challenging environments demonstrate the proposed approach.

LGJun 4, 2021
Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control

Yikun Cheng, Pan Zhao, Manan Gandhi et al.

A reinforcement learning (RL) policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. Existing robust methods try to obtain a fixed policy for all envisioned dynamic variation scenarios through robust or adversarial training. These methods could lead to conservative performance due to emphasis on the worst case, and often involve tedious modifications to the training environment. We propose an approach to robustifying a pre-trained non-robust RL policy with $\mathcal{L}_1$ adaptive control. Leveraging the capability of an $\mathcal{L}_1$ control law in the fast estimation of and active compensation for dynamic variations, our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world. Numerical experiments are provided to validate the efficacy of the proposed approach.

APMar 4, 2021
Exploring the representativeness of the M5 competition data

Evangelos Theodorou, Shengjie Wang, Yanfei Kang et al.

The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field in order to identify best practices and highlight their practical implications. However, whether the findings of the M5 competition can be generalized and exploited by retail firms to better support their decisions and operation depends on the extent to which the M5 data is sufficiently similar to unit sales data of retailers that operate in different regions, sell different types of products, and consider different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporación Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest that there are only small discrepancies between the examined data sets, supporting the representativeness of the M5 data.

ROSep 11, 2020
Autonomous Hybrid Ground/Aerial Mobility in Unknown Environments

David D. Fan, Rohan Thakker, Tara Bartlett et al.

Hybrid ground and aerial vehicles can possess distinct advantages over ground-only or flight-only designs in terms of energy savings and increased mobility. In this work we outline our unified framework for controls, planning, and autonomy of hybrid ground/air vehicles. Our contribution is three-fold: 1) We develop a control scheme for the control of passive two-wheeled hybrid ground/aerial vehicles. 2) We present a unified planner for both rolling and flying by leveraging differential flatness mappings. 3) We conduct experiments leveraging mapping and global planning for hybrid mobility in unknown environments, showing that hybrid mobility uses up to five times less energy than flying only.

SYSep 8, 2020
Contraction $\mathcal{L}_1$-Adaptive Control using Gaussian Processes

Aditya Gahlawat, Arun Lakshmanan, Lin Song et al.

We present $\mathcal{CL}_1$-$\mathcal{GP}$, a control framework that enables safe simultaneous learning and control for systems subject to uncertainties. The two main constituents are contraction theory-based $\mathcal{L}_1$ ($\mathcal{CL}_1$) control and Bayesian learning in the form of Gaussian process (GP) regression. The $\mathcal{CL}_1$ controller ensures that control objectives are met while providing safety certificates. Furthermore, $\mathcal{CL}_1$-$\mathcal{GP}$ incorporates any available data into a GP model of uncertainties, which improves performance and enables the motion planner to achieve optimality safely. This way, the safe operation of the system is always guaranteed, even during the learning transients. We provide a few illustrative examples for the safe learning and control of planar quadrotor systems in a variety of environments.

OCJun 22, 2020
Forward-Backward Rapidly-Exploring Random Trees for Stochastic Optimal Control

Kelsey P. Hawkins, Ali Pakniyat, Evangelos Theodorou et al.

We propose a numerical method for the computation of the forward-backward stochastic differential equations (FBSDE) appearing in the Feynman-Kac representation of the value function in stochastic optimal control problems. By the use of the Girsanov change of probability measures, it is demonstrated how a rapidly-exploring random tree (RRT) method can be utilized for the forward integration pass, as long as the controlled drift terms are appropriately compensated in the backward integration pass. Subsequently, a numerical approximation of the value function is proposed by solving a series of function approximation problems backwards in time along the edges of the constructed RRT. Moreover, a local entropy-weighted least squares Monte Carlo (LSMC) method is developed to concentrate function approximation accuracy in regions most likely to be visited by optimally controlled trajectories. The results of the proposed methodology are demonstrated on linear and nonlinear stochastic optimal control problems with non-quadratic running costs, which reveal significant convergence improvements over previous FBSDE-based numerical solution methods.

MLJun 25, 2018
Propagating Uncertainty through the tanh Function with Application to Reservoir Computing

Manan Gandhi, Keuntaek Lee, Yunpeng Pan et al.

Many neural networks use the tanh activation function, however when given a probability distribution as input, the problem of computing the output distribution in neural networks with tanh activation has not yet been addressed. One important example is the initialization of the echo state network in reservoir computing, where random initialization of the reservoir requires time to wash out the initial conditions, thereby wasting precious data and computational resources. Motivated by this problem, we propose a novel solution utilizing a moment based approach to propagate uncertainty through an Echo State Network to reduce the washout time. In this work, we contribute two new methods to propagate uncertainty through the tanh activation function and propose the Probabilistic Echo State Network (PESN), a method that is shown to have better average performance than deterministic Echo State Networks given the random initialization of reservoir states. Additionally we test single and multi-step uncertainty propagation of our method on two regression tasks and show that we are able to recover similar means and variances as computed by Monte-Carlo simulations.

LGFeb 15, 2018
MPC-Inspired Neural Network Policies for Sequential Decision Making

Marcus Pereira, David D. Fan, Gabriel Nakajima An et al.

In this paper we investigate the use of MPC-inspired neural network policies for sequential decision making. We introduce an extension to the DAgger algorithm for training such policies and show how they have improved training performance and generalization capabilities. We take advantage of this extension to show scalable and efficient training of complex planning policy architectures in continuous state and action spaces. We provide an extensive comparison of neural network policies by considering feed forward policies, recurrent policies, and recurrent policies with planning structure inspired by the Path Integral control framework. Our results suggest that MPC-type recurrent policies have better robustness to disturbances and modeling error.

ROSep 21, 2017
Agile Autonomous Driving using End-to-End Deep Imitation Learning

Yunpeng Pan, Ching-An Cheng, Kamil Saigol et al.

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.

LGJan 30, 2017
Variational Policy for Guiding Point Processes

Yichen Wang, Grady Williams, Evangelos Theodorou et al.

Temporal point processes have been widely applied to model event sequence data generated by online users. In this paper, we consider the problem of how to design the optimal control policy for point processes, such that the stochastic system driven by the point process is steered to a target state. In particular, we exploit the key insight to view the stochastic optimal control problem from the perspective of optimal measure and variational inference. We further propose a convex optimization framework and an efficient algorithm to update the policy adaptively to the current system state. Experiments on synthetic and real-world data show that our algorithm can steer the user activities much more accurately and efficiently than other stochastic control methods.

ROAug 22, 2016
Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference

Yunpeng Pan, Xinyan Yan, Evangelos Theodorou et al.

Robotic systems must be able to quickly and robustly make decisions when operating in uncertain and dynamic environments. While Reinforcement Learning (RL) can be used to compute optimal policies with little prior knowledge about the environment, it suffers from slow convergence. An alternative approach is Model Predictive Control (MPC), which optimizes policies quickly, but also requires accurate models of the system dynamics and environment. In this paper we propose a new approach, adaptive probabilistic trajectory optimization, that combines the benefits of RL and MPC. Our method uses scalable approximate inference to learn and updates probabilistic models in an online incremental fashion while also computing optimal control policies via successive local approximations. We present two variations of our algorithm based on the Sparse Spectrum Gaussian Process (SSGP) model, and we test our algorithm on three learning tasks, demonstrating the effectiveness and efficiency of our approach.

SYSep 3, 2015
Model Predictive Path Integral Control using Covariance Variable Importance Sampling

Grady Williams, Andrew Aldrich, Evangelos Theodorou

In this paper we develop a Model Predictive Path Integral (MPPI) control algorithm based on a generalized importance sampling scheme and perform parallel optimization via sampling using a Graphics Processing Unit (GPU). The proposed generalized importance sampling scheme allows for changes in the drift and diffusion terms of stochastic diffusion processes and plays a significant role in the performance of the model predictive control algorithm. We compare the proposed algorithm in simulation with a model predictive control version of differential dynamic programming.

SYSep 3, 2015
Model Based Reinforcement Learning with Final Time Horizon Optimization

Wei Sun, Evangelos Theodorou, Panagiotis Tsiotras

We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems.

ROMay 28, 2014
Information-Theoretic Stochastic Optimal Control via Incremental Sampling-based Algorithms

Oktay Arslan, Evangelos Theodorou, Panagiotis Tsiotras

This paper considers optimal control of dynamical systems which are represented by nonlinear stochastic differential equations. It is well-known that the optimal control policy for this problem can be obtained as a function of a value function that satisfies a nonlinear partial differential equation, namely, the Hamilton-Jacobi-Bellman equation. This nonlinear PDE must be solved backwards in time, and this computation is intractable for large scale systems. Under certain assumptions, and after applying a logarithmic transformation, an alternative characterization of the optimal policy can be given in terms of a path integral. Path Integral (PI) based control methods have recently been shown to provide elegant solutions to a broad class of stochastic optimal control problems. One of the implementation challenges with this formalism is the computation of the expectation of a cost functional over the trajectories of the unforced dynamics. Computing such expectation over trajectories that are sampled uniformly may induce numerical instabilities due to the exponentiation of the cost. Therefore, sampling of low-cost trajectories is essential for the practical implementation of PI-based methods. In this paper, we use incremental sampling-based algorithms to sample useful trajectories from the unforced system dynamics, and make a novel connection between Rapidly-exploring Random Trees (RRTs) and information-theoretic stochastic optimal control. We show the results from the numerical implementation of the proposed approach to several examples.