SYLGApr 30, 2023

Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning

arXiv:2305.00576v11 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses safety in reinforcement learning for real-world applications where constraints are not explicitly defined, representing an incremental advance by combining existing methods with theoretical support.

The paper tackles the problem of learning safe reinforcement learning policies when safety constraints are unknown by proposing a framework that jointly learns safety constraints and optimal policies, achieving successful identification of both in grid-world environments with theoretical convergence guarantees and error bounds.

In many real-world applications, safety constraints for reinforcement learning (RL) algorithms are either unknown or not explicitly defined. We propose a framework that concurrently learns safety constraints and optimal RL policies in such environments, supported by theoretical guarantees. Our approach merges a logically-constrained RL algorithm with an evolutionary algorithm to synthesize signal temporal logic (STL) specifications. The framework is underpinned by theorems that establish the convergence of our joint learning process and provide error bounds between the discovered policy and the true optimal policy. We showcased our framework in grid-world environments, successfully identifying both acceptable safety constraints and RL policies while demonstrating the effectiveness of our theorems in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes