Path Integral Networks: End-to-End Differentiable Optimal Control
This work addresses the challenge of integrating planning and learning in continuous control tasks, offering a method applicable to imitation and reinforcement learning, though it is incremental as it builds on existing Path Integral algorithms.
The paper tackles the problem of learning to plan for optimal control by introducing Path Integral Networks (PI-Net), a differentiable recurrent network based on the Path Integral algorithm, which learns dynamics and cost models end-to-end through back-propagation. Preliminary results show it can mimic control demonstrations for simulated linear systems and pendulum swing-up tasks, learning latent models from the data.
In this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.