AI LGNov 18, 2024

Regret-Free Reinforcement Learning for LTL Specifications

Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani

arXiv:2411.12019v24.21 citationsh-index: 60

Originality Highly original

AI Analysis

This addresses the challenge of high-level temporal control in unknown dynamical systems for control theory, offering a non-incremental advance with practical guarantees.

The paper tackles the problem of learning controllers for linear temporal logic (LTL) specifications in systems with unknown dynamics, presenting the first regret-free online algorithm that provides sharp bounds on optimal behavior after finite learning episodes, unlike previous asymptotic approaches.

Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDP). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.

View on arXiv PDF

Similar