LGFeb 2, 2022

Imitation Learning by Estimating Expertise of Demonstrators

Mark Beliaev, Andy Shih, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani

arXiv:2202.01288v221.660 citationsHas Code

Originality Highly original

AI Analysis

This addresses the issue of suboptimal performance in imitation learning when using heterogeneous demonstrators, offering a method to improve learning efficiency in domains like robotics and games.

The paper tackles the problem of imitation learning from multiple demonstrators with varying expertise by developing a model that jointly learns a policy and estimates demonstrator expertise, enabling it to filter out suboptimal behavior. The result is a single policy that outperforms the best demonstrator, achieving an average 7% and up to 60% improvement in reward across 21 out of 23 settings in robotic and discrete environments.

Many existing imitation learning datasets are collected from multiple demonstrators, each with different expertise at different parts of the environment. Yet, standard imitation learning algorithms typically treat all demonstrators as homogeneous, regardless of their expertise, absorbing the weaknesses of any suboptimal demonstrators. In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators. This enables our model to learn from the optimal behavior and filter out the suboptimal behavior of each demonstrator. Our model learns a single policy that can outperform even the best demonstrator, and can be used to estimate the expertise of any demonstrator at any state. We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess, out-performing competing methods in $21$ out of $23$ settings, with an average of $7\%$ and up to $60\%$ improvement in terms of the final reward.

View on arXiv PDF Code

Similar