LGAIOct 16, 2024

Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards

arXiv:2410.12175v114 citationsh-index: 1NIPS
Originality Highly original
AI Analysis

This addresses the challenge of explainability in RL for formal specifications, offering a foundational method with broad applicability.

The paper tackles the problem of learning optimal policies for reinforcement learning with LTL and ω-regular objectives by reducing it to a limit-average reward problem via reward machines, showing that optimal policies can be learned asymptotically.

Linear temporal logic (LTL) and, more generally, $ω$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for $ω$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and $ω$-regular objectives can be learned asymptotically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes