LGAINov 2, 2023

Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

arXiv:2311.01331v33 citationsh-index: 67Has Code
AI Analysis

This addresses the challenge of learning from expert states without actions in costly real-world scenarios, offering a more flexible approach for domain-specific applications.

The paper tackles the problem of offline imitation learning from observations by proposing PW-DICE, which minimizes the primal Wasserstein distance between learner and expert state occupancies with a contrastively learned metric, improving upon state-of-the-art methods in empirical evaluations.

In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods, as exemplified by SMODICE, minimize the state occupancy divergence between the learner's and empirical expert policies. However, such methods are limited to either $f$-divergences (KL and $chi^2$) or Wasserstein distance with Rubinstein duality, the latter of which constrains the underlying distance metric crucial to the performance of Wasserstein-based solutions. To enable more flexible distance metrics, we propose Primal Wasserstein DICE (PW-DICE). It minimizes the primal Wasserstein distance between the learner and expert state occupancies and leverages a contrastively learned distance metric. Theoretically, our framework is a generalization of SMODICE, and is the first work that unifies $f$-divergence and Wasserstein minimization. Empirically, we find that PW-DICE improves upon several state-of-the-art methods. The code is available at https://github.com/KaiYan289/PW-DICE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes