LG MLSep 15, 2019

VILD: Variational Imitation Learning with Diverse-quality Demonstrations

Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama

arXiv:1909.06769v14.822 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of scalable and data-efficient imitation learning in realistic settings where demonstration quality is diverse, which is an incremental improvement over prior methods.

The paper tackles the problem of imitation learning from demonstrations of varying quality, where the expertise level of demonstrators is unknown, by proposing VILD, a method that models demonstrator expertise probabilistically and estimates it alongside a reward function. Experiments on continuous-control benchmarks show that VILD outperforms state-of-the-art methods.

The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL method called \underline{v}ariational \underline{i}mitation \underline{l}earning with \underline{d}iverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive approach to estimation is not suitable to large state and action spaces, and fix its issues by using a variational approach which can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.

View on arXiv PDF

Similar