Model Predictive Adversarial Imitation Learning for Planning from Observation
This work addresses the challenge of reliable planning from observation-only demonstrations for robotics and control applications, representing an incremental advancement by integrating existing methods.
The paper tackles the problem of learning to plan from ambiguous and incomplete human demonstrations by unifying inverse reinforcement learning with model predictive control, resulting in significant improvements in sample efficiency, out-of-distribution generalization, and robustness in simulated and real-world experiments.
Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.