LG MLJun 18, 2020

Reparameterized Variational Divergence Minimization for Stable Imitation

Dilip Arumugam, Debadeepta Dey, Alekh Agarwal, Asli Celikyilmaz, Elnaz Nouri, Bill Dolan

arXiv:2006.10810v13.33 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of stable imitation learning from observation for robotics and control applications, representing an incremental improvement over existing methods.

The paper tackled the problem of imitation learning from observation (ILO) by addressing numerical instabilities in f-divergence minimization through a reparameterization trick, resulting in ILO algorithms that outperform baselines and more closely match expert performance in low-dimensional continuous-control tasks.

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success. Inspired by recent investigations of $f$-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that $f$-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.

View on arXiv PDF

Similar