ROAILGOct 2, 2023

Imitation Learning from Observation through Optimal Transport

MILA
arXiv:2310.01632v25 citationsh-index: 28
AI Analysis

This provides a simpler, more flexible method for imitation learning in robotics or AI systems where expert actions are unavailable, though it is incremental as it builds on existing optimal transport ideas.

The paper tackles imitation learning from observation (ILfO) by simplifying optimal transport to generate rewards without learned models or adversarial learning, achieving expert-level performance across continuous control tasks with only a single expert trajectory and no actions.

Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes