LGMLFeb 2, 2019

Certified Reinforcement Learning with Logic Guidance

arXiv:1902.00778v468 citations
Originality Incremental advance
AI Analysis

This addresses the need for formal safety guarantees in RL for control problems, though it is incremental as it builds on existing LTL and automaton methods.

The paper tackled the problem of applying reinforcement learning in safety-critical domains by proposing a model-free RL algorithm that uses Linear Temporal Logic (LTL) to specify goals for unknown continuous-state/action MDPs, resulting in a guaranteed synthesis of control policies that satisfy LTL specifications with maximal probability.

Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied to a variety of control problems. However, applications in safety-critical domains require a systematic and formal approach to specifying requirements as tasks or goals. We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs). The given LTL property is translated into a Limit-Deterministic Generalised Buchi Automaton (LDGBA), which is then used to shape a synchronous reward function on-the-fly. Under certain assumptions, the algorithm is guaranteed to synthesise a control policy whose traces satisfy the LTL specification with maximal probability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes