LGAIJul 20, 2017

RAIL: Risk-Averse Imitation Learning

arXiv:1707.06658v423 citations
Originality Incremental advance
AI Analysis

This addresses reliability issues in risk-sensitive applications like robotic surgery and autonomous driving, offering an incremental improvement over GAIL.

The paper tackles the problem of high-cost, catastrophic failure trajectories in Generative Adversarial Imitation Learning (GAIL) by proposing RAIL, a risk-averse algorithm that minimizes tail risk using Conditional-Value-at-Risk (CVaR), resulting in policies with lower tail-end risk than vanilla GAIL.

Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes