RO LG SYApr 8, 2024

SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Yusuf Umut Ciftci, Darren Chiu, Zeyuan Feng, Gaurav S. Sukhatme, Somil Bansal

arXiv:2404.05249v213.016 citationsh-index: 93ICRA

Originality Incremental advance

AI Analysis

This addresses safety-critical issues in robotic systems, offering a design-time solution to reduce failures, though it is incremental as it builds on existing behavior cloning methods.

The paper tackles the problem of safety violations in behavior cloning for robotics by proposing SAFE-GIL, a method that injects adversarial disturbances during training to simulate policy errors and improve safety in critical states, resulting in a significant reduction in safety failures, especially in low data regimes.

Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher. See our website here: https://y-u-c.github.io/safegil/

View on arXiv PDF

Similar