LG AIMay 27, 2025

Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations

Huy Hoang, Tien Mai, Pradeep Varakantham, Tanvi Verma

arXiv:2505.21182v14.1h-index: 4

Originality Incremental advance

AI Analysis

This work addresses the problem of improving imitation learning efficiency and stability for AI systems by leveraging undesirable behaviors, representing an incremental advance in the field.

The paper tackles offline imitation learning by incorporating both expert and undesirable demonstrations, proposing a novel formulation based on a difference of KL divergences that avoids adversarial training and unifies handling of positive and negative data. It demonstrates consistent outperformance over state-of-the-art baselines in experiments on standard benchmarks.

Offline imitation learning typically learns from expert and unlabeled demonstrations, yet often overlooks the valuable signal in explicitly undesirable behaviors. In this work, we study offline imitation learning from contrasting behaviors, where the dataset contains both expert and undesirable demonstrations. We propose a novel formulation that optimizes a difference of KL divergences over the state-action visitation distributions of expert and undesirable (or bad) data. Although the resulting objective is a DC (Difference-of-Convex) program, we prove that it becomes convex when expert demonstrations outweigh undesirable demonstrations, enabling a practical and stable non-adversarial training objective. Our method avoids adversarial training and handles both positive and negative demonstrations in a unified framework. Extensive experiments on standard offline imitation learning benchmarks demonstrate that our approach consistently outperforms state-of-the-art baselines.

View on arXiv PDF

Similar