Robust Constrained Reinforcement Learning
This addresses the challenge of ensuring safety and reliability in reinforcement learning for applications like robotics or autonomous systems, though it is incremental as it builds on existing constrained RL methods.
The paper tackles the problem of constrained reinforcement learning under model uncertainty, where performance degrades and constraints are violated when the test environment differs from the training one. It proposes a robust framework that guarantees constraint satisfaction for all MDPs in an uncertainty set and maximizes worst-case reward, with theoretical convergence and sample complexity guarantees.
Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of $δ$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.