LG OCMar 23, 2024

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu

arXiv:2403.15928v17.97 citationsh-index: 13CDC

Originality Incremental advance

AI Analysis

This work addresses safety-critical applications in reinforcement learning, such as robotics or autonomous systems, by providing a method to avoid constraint violations during learning, though it appears incremental as it builds on existing constrained MDP frameworks.

The paper tackles the problem of learning optimal policies in constrained Markov decision processes with stochastic stopping time while ensuring safety constraints during learning, proposing an online reinforcement learning algorithm based on linear programming that achieves safe policies with high confidence and demonstrates efficacy through simulation results.

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the problem of learning optimal policy without violating safety constraints during the learning phase is yet to be addressed. To this end, we propose an algorithm based on linear programming that does not require a process model. We show that the learned policy is safe with high confidence. We also propose a method to compute a safe baseline policy, which is central in developing algorithms that do not violate the safety constraints. Finally, we provide simulation results to show the efficacy of the proposed algorithm. Further, we demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.

View on arXiv PDF

Similar