LGLOOct 18, 2023

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

arXiv:2310.12248v311 citationsh-index: 50
Originality Incremental advance
AI Analysis

This addresses the challenge of expressing non-Markovian objectives in reinforcement learning for applications like robotics or verification, though it appears incremental as it builds on existing LTL and omega-regular frameworks.

The paper tackles the problem of learning policies for omega-regular objectives in Markov decision processes (MDPs) by introducing a model-based PAC learning algorithm, proving it requires a polynomial number of samples and validating it experimentally.

Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes