SYFeb 1, 2016
Sample Efficient Path Integral Control under UncertaintyYunpeng Pan, Evangelos A. Theodorou, Michail Kontitsis
We present a data-driven optimal control framework that can be viewed as a generalization of the path integral (PI) control approach. We find iterative feedback control laws without parameterization based on probabilistic representation of learned dynamics model. The proposed algorithm operates in a forward-backward manner which differentiate from other PI-related methods that perform forward sampling to find optimal controls. Our method uses significantly less samples to find optimal controls compared to other approaches within the PI control family that relies on extensive sampling from given dynamics models or trials on physical systems in model-free fashions. In addition, the learned controllers can be generalized to new tasks without re-sampling based on the compositionality theory for the linearly-solvable optimal control framework. We provide experimental results on three different systems and comparisons with state-of-the-art model-based methods to demonstrate the efficiency and generalizability of the proposed framework.
SYJun 15, 2016
Nonparametric Infinite Horizon Kullback-Leibler Stochastic ControlYunpeng Pan, Evangelos Theodorou
We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nyström approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks.
SYFeb 15, 2017
Pseudospectral Model Predictive Control under Partially Learned DynamicsManan Gandhi, Yunpeng Pan, Evangelos Theodorou
Trajectory optimization of a controlled dynamical system is an essential part of autonomy, however many trajectory optimization techniques are limited by the fidelity of the underlying parametric model. In the field of robotics, a lack of model knowledge can be overcome with machine learning techniques, utilizing measurements to build a dynamical model from the data. This paper aims to take the middle ground between these two approaches by introducing a semi-parametric representation of the underlying system dynamics. Our goal is to leverage the considerable information contained in a traditional physics based model and combine it with a data-driven, non-parametric regression technique known as a Gaussian Process. Integrating this semi-parametric model with model predictive pseudospectral control, we demonstrate this technique on both a cart pole and quadrotor simulation with unmodeled damping and parametric error. In order to manage parametric uncertainty, we introduce an algorithm that utilizes Sparse Spectrum Gaussian Processes (SSGP) for online learning after each rollout. We implement this online learning technique on a cart pole and quadrator, then demonstrate the use of online learning and obstacle avoidance for the dubin vehicle dynamics.
MLJun 25, 2018
Propagating Uncertainty through the tanh Function with Application to Reservoir ComputingManan Gandhi, Keuntaek Lee, Yunpeng Pan et al.
Many neural networks use the tanh activation function, however when given a probability distribution as input, the problem of computing the output distribution in neural networks with tanh activation has not yet been addressed. One important example is the initialization of the echo state network in reservoir computing, where random initialization of the reservoir requires time to wash out the initial conditions, thereby wasting precious data and computational resources. Motivated by this problem, we propose a novel solution utilizing a moment based approach to propagate uncertainty through an Echo State Network to reduce the washout time. In this work, we contribute two new methods to propagate uncertainty through the tanh activation function and propose the Probabilistic Echo State Network (PESN), a method that is shown to have better average performance than deterministic Echo State Networks given the random initialization of reservoir states. Additionally we test single and multi-step uncertainty propagation of our method on two regression tasks and show that we are able to recover similar means and variances as computed by Monte-Carlo simulations.
ROSep 21, 2017
Agile Autonomous Driving using End-to-End Deep Imitation LearningYunpeng Pan, Ching-An Cheng, Kamil Saigol et al.
We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.
ROAug 22, 2016
Adaptive Probabilistic Trajectory Optimization via Efficient Approximate InferenceYunpeng Pan, Xinyan Yan, Evangelos Theodorou et al.
Robotic systems must be able to quickly and robustly make decisions when operating in uncertain and dynamic environments. While Reinforcement Learning (RL) can be used to compute optimal policies with little prior knowledge about the environment, it suffers from slow convergence. An alternative approach is Model Predictive Control (MPC), which optimizes policies quickly, but also requires accurate models of the system dynamics and environment. In this paper we propose a new approach, adaptive probabilistic trajectory optimization, that combines the benefits of RL and MPC. Our method uses scalable approximate inference to learn and updates probabilistic models in an online incremental fashion while also computing optimal control policies via successive local approximations. We present two variations of our algorithm based on the Sparse Spectrum Gaussian Process (SSGP) model, and we test our algorithm on three learning tasks, demonstrating the effectiveness and efficiency of our approach.
LGJul 15, 2016
Learning from Conditional Distributions via Dual EmbeddingsBo Dai, Niao He, Yunpeng Pan et al.
Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$. These learning problems become very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution. Commonly used approaches either assume that $z$ is independent of $x$, or require an overwhelmingly large samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem. With such new reformulation, we only need to deal with the joint distribution $p(z,x)$. We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments on both synthetic and real-world datasets show that the proposed approach can significantly improve over the existing algorithms.
SYDec 9, 2014
Model-based Path Integral Stochastic Control: A Bayesian Nonparametric ApproachYunpeng Pan, Evangelos A. Theodorou, Michail Kontitsis
Over the last few years, sampling-based stochastic optimal control (SOC) frameworks have shown impressive performances in reinforcement learning (RL) with applications in robotics. However, such approaches require a large amount of samples from many interactions with the physical systems. To improve learning efficiency, we present a novel model-based and data-driven SOC framework based on path integral formulation and Gaussian processes (GPs). The proposed approach learns explicit and time-varying optimal controls autonomously from limited sampled data. Based on this framework, we propose an iterative control scheme with improved applicability in higher-dimensional and more complex control tasks. We demonstrate the effectiveness and efficiency of the proposed framework using two nontrivial examples. Compared to state-of-the-art RL methods, the proposed framework features superior control learning efficiency.