Robust Maximum Entropy Behavior Cloning
This work is significant for improving the robustness of imitation learning algorithms for practitioners who rely on expert demonstrations, especially when data quality cannot be guaranteed.
This paper addresses the problem of adversarial demonstrations in imitation learning by proposing a framework that autonomously detects and excludes such demonstrations from the dataset. The method uses a min-max problem leveraging model entropy to assign weights to demonstrations, enabling learning from only correct or a mixture of correct demonstrations.
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task. Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set? This may result in poor decision-making performance. We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set. At the same time, it's sample, time-efficient, and does not require a simulator. To model such adversarial demonstration we propose a min-max problem that leverages the entropy of the model to assign weights for each demonstration. This allows us to learn the behavior using only the correct demonstrations or a mixture of correct demonstrations.