LG AINov 2, 2023

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

arXiv:2311.01329v16.67 citationsh-index: 67

Originality Incremental advance

AI Analysis

This addresses a stability issue in offline imitation learning for scenarios where expert actions are unavailable and interactions are costly, though it is incremental as it builds on prior DICE methods.

The paper tackles the problem of offline imitation learning from observations with incomplete trajectories, proposing TAILO, which uses a discounted sum along future trajectories for weighted behavior cloning, resulting in more robust and effective performance across multiple testbeds.

Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.

View on arXiv PDF

Similar