Omega-Regular Objectives in Model-Free Reinforcement Learning
This addresses a foundational challenge in formal verification and control for unknown systems, representing a novel method rather than incremental work.
The paper tackles the problem of model-free reinforcement learning for ω-regular objectives in Markov decision processes by providing the first solution that reduces satisfaction to an almost-sure reachability problem and uses limit-deterministic Buechi automata instead of Rabin automata, with experimental evaluation on benchmark problems.
We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.