LGAIMar 31, 2021

DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation

arXiv:2104.00163v17 citations
Originality Incremental advance
AI Analysis

This work addresses a practical limitation for deploying imitation learning in real-world settings where data collection is costly, though it is incremental as it builds on existing adversarial frameworks.

The paper tackles the high sample complexity in adversarial imitation learning from observation by integrating model-based reinforcement learning ideas, resulting in a more data-efficient algorithm that achieves similar or better performance with far fewer environment interactions.

In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. In this work, we hypothesize that we can incorporate ideas from model-based reinforcement learning with adversarial methods for IfO in order to increase the data efficiency of these methods without sacrificing performance. Specifically, we consider time-varying linear Gaussian policies, and propose a method that integrates the linear-quadratic regulator with path integral policy improvement into an existing adversarial IfO framework. The result is a more data-efficient IfO algorithm with better performance, which we show empirically in four simulation domains: using far fewer interactions with the environment, the proposed method exhibits similar or better performance than the existing technique.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes