Regret-Free Reinforcement Learning for LTL Specifications
This addresses the challenge of high-level temporal control in unknown dynamical systems for control theory, offering a non-incremental advance with practical guarantees.
The paper tackles the problem of learning controllers for linear temporal logic (LTL) specifications in systems with unknown dynamics, presenting the first regret-free online algorithm that provides sharp bounds on optimal behavior after finite learning episodes, unlike previous asymptotic approaches.
Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDP). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.