Adversarial Option-Aware Hierarchical Imitation Learning
This addresses the problem of hierarchical imitation learning for agents in robotics or AI, though it appears incremental as it builds on prior methods like GAIL and hierarchical frameworks.
The paper tackles the challenge of learning skills from long-horizon unannotated demonstrations by proposing Option-GAIL, a method that models task hierarchy with options and uses generative adversarial optimization, resulting in consistent outperformance over existing approaches across various tasks.
It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.