Policy Gradients for Probabilistic Constrained Reinforcement Learning
This work addresses the challenge of probabilistic safety constraints in RL, which is crucial for applications like robotics and autonomous systems, though it appears incremental as it builds on existing policy-based methods.
The paper tackles the problem of learning safe policies in reinforcement learning by focusing on probabilistic safety constraints, which require maintaining the system state in a safe set with high probability, and it provides the first explicit gradient expressions for these constraints, demonstrating empirical feasibility in a continuous navigation problem.
This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the system in a safe set with high probability. This notion differs from cumulative constraints often considered in the literature. The challenge of working with probabilistic safety is the lack of expressions for their gradients. Indeed, policy optimization algorithms rely on gradients of the objective function and the constraints. To the best of our knowledge, this work is the first one providing such explicit gradient expressions for probabilistic constraints. It is worth noting that the gradient of this family of constraints can be applied to various policy-based algorithms. We demonstrate empirically that it is possible to handle probabilistic constraints in a continuous navigation problem.