SYAILGLOJan 14, 2020

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

arXiv:2001.04669v329 citations
AI Analysis

This addresses the challenge of sparse rewards in reinforcement learning for formal verification tasks, but it is incremental as it builds on existing automaton-based methods with specific enhancements.

The paper tackles the problem of synthesizing control policies for systems modeled as Markov decision processes to satisfy linear temporal logic specifications, by proposing a reinforcement learning method that relaxes reward sparsity and learns optimal circulations among accepting sets, showing it can learn an optimal policy with a discount factor close to one.

This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes