LG MLJun 19, 2019

Wasserstein Adversarial Imitation Learning

Huang Xiao, Michael Herman, Joerg Wagner, Sebastian Ziesche, Jalal Etesami, Thai Hong Linh

arXiv:1906.08113v119.279 citations

Originality Incremental advance

AI Analysis

This work addresses sample-efficiency in imitation learning for robotics, offering a novel method that is incremental in improving reward function design.

The paper tackles the problem of recovering an expert policy from demonstrations in imitation learning by proposing Wasserstein Adversarial Imitation Learning, which connects inverse reinforcement learning to optimal transport to enable more general reward functions; in robotic experiments, it outperforms baselines in average cumulative rewards and achieves significant sample-efficiency with just one expert demonstration.

Imitation Learning describes the problem of recovering an expert policy from demonstrations. While inverse reinforcement learning approaches are known to be very sample-efficient in terms of expert demonstrations, they usually require problem-dependent reward functions or a (task-)specific reward-function regularization. In this paper, we show a natural connection between inverse reinforcement learning approaches and Optimal Transport, that enables more general reward functions with desirable properties (e.g., smoothness). Based on our observation, we propose a novel approach called Wasserstein Adversarial Imitation Learning. Our approach considers the Kantorovich potentials as a reward function and further leverages regularized optimal transport to enable large-scale applications. In several robotic experiments, our approach outperforms the baselines in terms of average cumulative rewards and shows a significant improvement in sample-efficiency, by requiring just one expert demonstration.

View on arXiv PDF

Similar