LGAILOSep 21, 2022

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

arXiv:2209.10341v119 citationsh-index: 55Has Code
Originality Incremental advance
AI Analysis

This provides a certified policy synthesis method for applications requiring temporal logic constraints, such as robotics or autonomous systems, but it is incremental as it builds on existing RL and automata techniques.

The authors tackled the problem of synthesizing policies for unknown Markov Decision Processes to satisfy linear temporal specifications with maximal probability, developing LCRL, a software tool that uses model-free reinforcement learning with shaped rewards based on Limit Deterministic Buchi Automata, achieving robust and scalable performance compared to standard RL approaches.

LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes