LGROJul 6, 2023

Policy Contrastive Imitation Learning

arXiv:2307.02829v14 citationsh-index: 67
Originality Highly original
AI Analysis

This addresses a bottleneck in imitation learning for robotics and control tasks, offering a novel method to improve representation quality, though it is incremental as it builds on existing AIL frameworks.

The paper tackles the issue of low-quality discriminator representation in adversarial imitation learning (AIL), which leads to unsatisfactory performance on challenging tasks, by proposing Policy Contrastive Imitation Learning (PCIL) that learns a contrastive representation space and generates a smooth cosine-similarity-based reward, achieving state-of-the-art performance on the DeepMind Control suite.

Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes