Addressing reward bias in Adversarial Imitation Learning with neutral reward functions
This addresses a fundamental problem in imitation learning for task-based environments, though it appears incremental as it builds on existing GAIL methods.
The paper tackles reward bias in Generative Adversarial Imitation Learning by proposing a new reward function that outperforms existing methods in task-based environments with single and multiple terminal states, effectively overcoming survival and termination bias.
Generative Adversarial Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm. Different types of biases also affect different types of environments - which are broadly divided into survival and task-based environments. We provide a theoretical sketch of why existing reward functions would fail in imitation learning scenarios in task based environments with multiple terminal states. We also propose a new reward function for GAIL which outperforms existing GAIL methods on task based environments with single and multiple terminal states and effectively overcomes both survival and termination bias.