LG LOOct 18, 2023

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

Mateo Perez, Fabio Somenzi, Ashutosh Trivedi

arXiv:2310.12248v313.011 citationsh-index: 50

Originality Incremental advance

AI Analysis

This addresses the challenge of expressing non-Markovian objectives in reinforcement learning for applications like robotics or verification, though it appears incremental as it builds on existing LTL and omega-regular frameworks.

The paper tackles the problem of learning policies for omega-regular objectives in Markov decision processes (MDPs) by introducing a model-based PAC learning algorithm, proving it requires a polynomial number of samples and validating it experimentally.

Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

View on arXiv PDF

Similar