LGAIROFeb 14, 2023

Constrained Decision Transformer for Offline Safe Reinforcement Learning

CMU
arXiv:2302.07351v288 citationsh-index: 35Has Code
AI Analysis

This addresses the challenge of offline safe reinforcement learning for real-world applications where safety constraints are critical, though it is incremental as it builds on existing decision transformer methods.

The paper tackles the problem of learning a safe reinforcement learning policy from an offline dataset by proposing the constrained decision transformer (CDT) approach, which dynamically adjusts trade-offs between safety and task performance, and it outperforms baselines by a large margin across all tasks while enabling zero-shot adaptation to different constraint thresholds.

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $ε$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints. The code is available at https://github.com/liuzuxin/OSRL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes