Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning
This work addresses instability issues in constrained RL for safety-critical robotic applications, representing an incremental improvement over existing methods.
The paper tackles the challenge of finding a deterministic policy in constrained reinforcement learning that maximizes one reward uniformly across all states while satisfying a probabilistic constraint on another reward, which is critical for safety in robotic systems. The authors propose an algorithm using recursive constraints to prevent learning instability, with an approximative form that improves efficiency and maintains constraint conservatism.
We consider the challenge of finding a deterministic policy for a Markov decision process that uniformly (in all states) maximizes one reward subject to a probabilistic constraint over a different reward. Existing solutions do not fully address our precise problem definition, which nevertheless arises naturally in the context of safety-critical robotic systems. This class of problem is known to be hard, but the combined requirements of determinism and uniform optimality can create learning instability. In this work, after describing and motivating our problem with a simple example, we present a suitable constrained reinforcement learning algorithm that prevents learning instability, using recursive constraints. Our proposed approach admits an approximative form that improves efficiency and is conservative w.r.t. the constraint.