ROSYFeb 19, 2021

Probabilistically Guaranteed Satisfaction of Temporal Logic Constraints During Reinforcement Learning

arXiv:2102.10063v215 citations
AI Analysis

This addresses the challenge of safe and reliable policy learning in autonomous systems like drones, though it appears incremental by building on existing constrained RL and automata-theoretic approaches.

The paper tackles the problem of ensuring probabilistic satisfaction of temporal logic constraints during reinforcement learning, proposing a method that guarantees constraint satisfaction in each episode and demonstrating it in a drone scenario with periodic tasks and high-reward monitoring.

We propose a novel constrained reinforcement learning method for finding optimal policies in Markov Decision Processes while satisfying temporal logic constraints with a desired probability throughout the learning process. An automata-theoretic approach is proposed to ensure the probabilistic satisfaction of the constraint in each episode, which is different from penalizing violations to achieve constraint satisfaction after a sufficiently large number of episodes. The proposed approach is based on computing a lower bound on the probability of constraint satisfaction and adjusting the exploration behavior as needed. We present theoretical results on the probabilistic constraint satisfaction achieved by the proposed approach. We also numerically demonstrate the proposed idea in a drone scenario, where the constraint is to perform periodically arriving pick-up and delivery tasks and the objective is to fly over high-reward zones to simultaneously perform aerial monitoring.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes