SYFeb 1, 2016
Sample Efficient Path Integral Control under UncertaintyYunpeng Pan, Evangelos A. Theodorou, Michail Kontitsis
We present a data-driven optimal control framework that can be viewed as a generalization of the path integral (PI) control approach. We find iterative feedback control laws without parameterization based on probabilistic representation of learned dynamics model. The proposed algorithm operates in a forward-backward manner which differentiate from other PI-related methods that perform forward sampling to find optimal controls. Our method uses significantly less samples to find optimal controls compared to other approaches within the PI control family that relies on extensive sampling from given dynamics models or trials on physical systems in model-free fashions. In addition, the learned controllers can be generalized to new tasks without re-sampling based on the compositionality theory for the linearly-solvable optimal control framework. We provide experimental results on three different systems and comparisons with state-of-the-art model-based methods to demonstrate the efficiency and generalizability of the proposed framework.
SYDec 9, 2014
Model-based Path Integral Stochastic Control: A Bayesian Nonparametric ApproachYunpeng Pan, Evangelos A. Theodorou, Michail Kontitsis
Over the last few years, sampling-based stochastic optimal control (SOC) frameworks have shown impressive performances in reinforcement learning (RL) with applications in robotics. However, such approaches require a large amount of samples from many interactions with the physical systems. To improve learning efficiency, we present a novel model-based and data-driven SOC framework based on path integral formulation and Gaussian processes (GPs). The proposed approach learns explicit and time-varying optimal controls autonomously from limited sampled data. Based on this framework, we propose an iterative control scheme with improved applicability in higher-dimensional and more complex control tasks. We demonstrate the effectiveness and efficiency of the proposed framework using two nontrivial examples. Compared to state-of-the-art RL methods, the proposed framework features superior control learning efficiency.