LGJul 19, 2024

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

CMU
arXiv:2407.14653v114 citationsh-index: 20
Originality Highly original
AI Analysis

This addresses the challenge of training safe RL policies from limited or imperfect data, which is crucial for real-world applications like robotics or autonomous systems where safety is critical.

The paper tackles the problem of offline safe reinforcement learning, where existing methods struggle with imperfect demonstrations, by introducing OASIS, a new paradigm that uses a conditional diffusion model to shape data distributions, resulting in agents achieving high-reward behavior while satisfying safety constraints and outperforming baselines.

Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. Most current methods struggle with the mismatch between imperfect demonstrations and the desired safe and rewarding performance. In this paper, we introduce OASIS (cOnditionAl diStributIon Shaping), a new paradigm in offline safe RL designed to overcome these critical limitations. OASIS utilizes a conditional diffusion model to synthesize offline datasets, thus shaping the data distribution toward a beneficial target domain. Our approach makes compliance with safety constraints through effective data utilization and regularization techniques to benefit offline safe RL training. Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS's superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines. Furthermore, OASIS exhibits high data efficiency and robustness, making it suitable for real-world applications, particularly in tasks where safety is imperative and high-quality demonstrations are scarce.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes