LGFeb 18

Vulnerability Analysis of Safe Reinforcement Learning via Inverse Constrained Reinforcement Learning

arXiv:2602.16543v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses a critical security problem for Safe RL systems in adversarial environments, though it is incremental as it builds on existing attack methods by removing the need for gradient access.

The paper tackles the vulnerability of Safe Reinforcement Learning (Safe RL) policies to adversarial perturbations in real-world settings by proposing an adversarial attack framework that uses expert demonstrations and black-box interactions to learn constraint models and surrogate policies, enabling gradient-based attacks without needing the victim policy's internal gradients or ground-truth safety constraints, with experiments on multiple benchmarks demonstrating its effectiveness under limited access.

Safe reinforcement learning (Safe RL) aims to ensure policy performance while satisfying safety constraints. However, most existing Safe RL methods assume benign environments, making them vulnerable to adversarial perturbations commonly encountered in real-world settings. In addition, existing gradient-based adversarial attacks typically require access to the policy's gradient information, which is often impractical in real-world scenarios. To address these challenges, we propose an adversarial attack framework to reveal vulnerabilities of Safe RL policies. Using expert demonstrations and black-box environment interaction, our framework learns a constraint model and a surrogate (learner) policy, enabling gradient-based attack optimization without requiring the victim policy's internal gradients or the ground-truth safety constraints. We further provide theoretical analysis establishing feasibility and deriving perturbation bounds. Experiments on multiple Safe RL benchmarks demonstrate the effectiveness of our approach under limited privileged access.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes