LG AI MLJun 23, 2020

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai

arXiv:2006.13258v614.029 citationsHas Code

Originality Highly original

AI Analysis

This method addresses the practical challenges of adversarial imitation learning for researchers and practitioners by simplifying implementation and improving efficiency.

The paper tackles the instability and inefficiency of adversarial imitation learning by proposing a discriminator that directly learns the optimal policy, eliminating the need for separate reinforcement learning steps. This approach reduces computational burden by half and achieves competitive performance on various tasks.

Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of Adversarial Imitation Learning algorithms by removing the Reinforcement Learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent Imitation Learning methods.

View on arXiv PDF Code

Similar