LG AI ROJun 10, 2022

Imitation Learning via Differentiable Physics

arXiv:2206.04873v14.64 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of slow and unstable training in imitation learning for robotics and control, offering a more efficient and robust method, though it is incremental by building on existing differentiable physics techniques.

The paper tackles the inefficiency and instability of double-loop imitation learning methods by proposing Imitation Learning via Differentiable Physics (ILD), which integrates a differentiable physics simulator as a prior to enable single-loop training, resulting in improved performance, faster convergence, and better stability across continuous control and deformable object manipulation tasks.

Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to suffer long training time and high variance. In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability. The proposed ILD incorporates the differentiable physics simulator as a physics prior into its computational graph for policy learning. It unrolls the dynamics by sampling actions from a parameterized policy, simply minimizing the distance between the expert trajectory and the agent trajectory, and back-propagating the gradient into the policy via temporal physics operators. With the physics prior, ILD policies can not only be transferable to unseen environment specifications but also yield higher final performance on a variety of tasks. In addition, ILD naturally forms a single-loop structure, which significantly improves the stability and training speed. To simplify the complex optimization landscape induced by temporal physics operations, ILD dynamically selects the learning objectives for each state during optimization. In our experiments, we show that ILD outperforms state-of-the-art methods in a variety of continuous control tasks with Brax, requiring only one expert demonstration. In addition, ILD can be applied to challenging deformable object manipulation tasks and can be generalized to unseen configurations.

View on arXiv PDF Code

Similar