LGLOSYJun 16, 2021

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

arXiv:2106.09161v210 citations
AI Analysis

This addresses the error-prone and non-trivial task of reward design for reinforcement learning practitioners, though it appears incremental as it builds on existing formal methods and tools.

The paper tackles the problem of manually designing reward schemes for reinforcement learning by introducing Mungojerrie, a tool that compiles formal ω-regular objectives into reward schemes, enabling automated synthesis of controllers without prior system knowledge.

Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie (https://plv.colorado.edu/mungojerrie/) is a tool for testing reward schemes for $ω$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $ω$-automata specified in HOA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes