MobILE: Model-Based Imitation Learning From Observation Alone
This work addresses a challenging problem in robotics and AI where action data is unavailable, offering a provably efficient solution, though it is incremental as it builds on existing imitation learning methods.
The paper tackles imitation learning from observations alone (ILFO), where only expert states are available without actions, by proposing a model-based framework called MobILE that integrates optimism under uncertainty into distribution matching. It demonstrates strong theoretical guarantees for certain MDP dynamics and shows ILFO is strictly harder than standard imitation learning with an exponential sample complexity separation, supported by experiments on OpenAI Gym tasks.
This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off strategic exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by presenting an exponential sample complexity separation between IL and ILFO. We complement these theoretical results with experimental simulations on benchmark OpenAI Gym tasks that indicate the efficacy of MobILE. Code for implementing the MobILE framework is available at https://github.com/rahulkidambi/MobILE-NeurIPS2021.