LGAIDec 25, 2024

Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

arXiv:2412.18946v213 citationsh-index: 15Has CodeAAAI
Originality Incremental advance
AI Analysis

This addresses the problem of flexible safety constraint adaptation in offline RL for researchers and practitioners, though it is incremental as it builds on existing algorithms with a wrapper approach.

The paper tackles the challenge of adapting to varying safety constraints in offline safe reinforcement learning without retraining, by introducing a wrapper framework that learns multiple policies and switches between them during deployment, achieving consistent outperformance over existing methods on 38 benchmark tasks.

Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. The code is publicly available at https://github.com/yassineCh/CAPS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes