LGAIApr 14, 2021

Reward function shape exploration in adversarial imitation learning: an empirical study

arXiv:2104.06687v15 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the implicit reward bias problem in AIL for researchers and practitioners in reinforcement learning, but it is incremental as it empirically compares existing reward shapes without introducing a new method.

The study investigated how different reward function shapes affect performance in adversarial imitation learning (AIL) on continuous control tasks, finding that a positive logarithmic reward function works well in typical tasks, while an unbiased reward function is limited to specific tasks, with several other shapes also performing excellently.

For adversarial imitation learning algorithms (AILs), no true rewards are obtained from the environment for learning the strategy. However, the pseudo rewards based on the output of the discriminator are still required. Given the implicit reward bias problem in AILs, we design several representative reward function shapes and compare their performances by large-scale experiments. To ensure our results' reliability, we conduct the experiments on a series of Mujoco and Box2D continuous control tasks based on four different AILs. Besides, we also compare the performance of various reward function shapes using varying numbers of expert trajectories. The empirical results reveal that the positive logarithmic reward function works well in typical continuous control tasks. In contrast, the so-called unbiased reward function is limited to specific kinds of tasks. Furthermore, several designed reward functions perform excellently in these environments as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes