LGMLSep 9, 2018

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

arXiv:1809.02925v2293 citations
Originality Incremental advance
AI Analysis

This addresses sample inefficiency and bias issues in imitation learning, enabling more practical applications in robotics and AI, though it is incremental as it builds on existing adversarial frameworks.

The paper tackled sample inefficiency and reward bias in adversarial imitation learning by proposing Discriminator-Actor-Critic, which reduces policy-environment interaction sample complexity by an average factor of 10 and uses an unbiased reward function for broader applicability.

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they require a prohibitively large number of interactions with the environment in order to imitate the expert for many real-world applications. In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. Furthermore, since our reward function is designed to be unbiased, we can apply our algorithm to many problems without making any task-specific adjustments.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes