LGFeb 5, 2022

Rethinking ValueDice: Does It Really Improve Performance?

Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

arXiv:2202.02468v212.417 citations

Originality Synthesis-oriented

AI Analysis

This work critically assesses a popular imitation learning method, revealing its limitations and incremental nature for researchers in the field.

The paper investigates whether ValueDice's performance improvements in adversarial imitation learning are due to algorithmic advances, finding that it reduces to Behavioral Cloning offline and its success depends on complete expert trajectories and regularization, with no inherent superiority.

Since the introduction of GAIL, adversarial imitation learning (AIL) methods attract lots of research interests. Among these methods, ValueDice has achieved significant improvements: it beats the classical approach Behavioral Cloning (BC) under the offline setting, and it requires fewer interactions than GAIL under the online setting. Are these improvements benefited from more advanced algorithm designs? We answer this question by the following conclusions. First, we show that ValueDice could reduce to BC under the offline setting. Second, we verify that overfitting exists and regularization matters in the low-data regime. Specifically, we demonstrate that with weight decay, BC also nearly matches the expert performance as ValueDice does. The first two claims explain the superior offline performance of ValueDice. Third, we establish that ValueDice does not work when the expert trajectory is subsampled. Instead, the mentioned success of ValueDice holds when the expert trajectory is complete, in which ValueDice is closely related to BC that performs well as mentioned. Finally, we discuss the implications of our research for imitation learning studies beyond ValueDice.

View on arXiv PDF

Similar