ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning
This work addresses a specific limitation in offline imitation learning for robotics or control tasks, but it is incremental as it builds on existing BC and GAN methods.
The paper tackled the issue of behavioral cloning (BC) being mean-seeking when using a Gaussian policy, which can lead to suboptimal performance, by introducing Adversarial Behavioral Cloning (ABC) that incorporates GAN training to achieve mode-seeking behavior. The result showed that ABC outperforms standard BC in toy domains and a Hopper-based domain, demonstrating improved mode-seeking capabilities.
Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data. This approach is commonly referred to as behavioral cloning (BC). In this work, we describe a key disadvantage of BC that arises due to the maximum likelihood objective function; namely that BC is mean-seeking with respect to the state-conditional expert action distribution when the learner's policy is represented with a Gaussian. To address this issue, we introduce a modified version of BC, Adversarial Behavioral Cloning (ABC), that exhibits mode-seeking behavior by incorporating elements of GAN (generative adversarial network) training. We evaluate ABC on toy domains and a domain based on Hopper from the DeepMind Control suite, and show that it outperforms standard BC by being mode-seeking in nature.