Imitation Learning by Reinforcement Learning
This provides a method for imitation learning that leverages reinforcement learning techniques, potentially benefiting robotics and control applications, though it appears incremental as it builds on existing imitation and reinforcement learning frameworks.
The paper tackles imitation learning by reducing it to reinforcement learning with a stationary reward for deterministic experts, theoretically certifying reward recovery and bounding the total variation distance between expert and learner policies. Experiments confirm the reduction works well in practice for continuous control tasks.
Imitation learning algorithms learn a policy from demonstrations of expert behavior. We show that, for deterministic experts, imitation learning can be done by reduction to reinforcement learning with a stationary reward. Our theoretical analysis both certifies the recovery of expert reward and bounds the total variation distance between the expert and the imitation learner, showing a link to adversarial imitation learning. We conduct experiments which confirm that our reduction works well in practice for continuous control tasks.