LOGTLGSYMar 5, 2015

Correct-by-synthesis reinforcement learning with temporal logic constraints

arXiv:1503.01793v163 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of ensuring safety and performance in autonomous systems like robotics, though it is incremental as it builds on existing temporal logic and reinforcement learning methods.

The paper tackles the problem of synthesizing reactive controllers that optimize an unknown performance criterion while satisfying temporal logic constraints in uncontrolled environments, by decoupling it into extracting a permissive strategy and then using maximin-Q learning to find an optimal strategy within that envelope, with correctness and optimality proven for a fragment of specifications.

We consider a problem on the synthesis of reactive controllers that optimize some a priori unknown performance criterion while interacting with an uncontrolled environment such that the system satisfies a given temporal logic specification. We decouple the problem into two subproblems. First, we extract a (maximally) permissive strategy for the system, which encodes multiple (possibly all) ways in which the system can react to the adversarial environment and satisfy the specifications. Then, we quantify the a priori unknown performance criterion as a (still unknown) reward function and compute an optimal strategy for the system within the operating envelope allowed by the permissive strategy by using the so-called maximin-Q learning algorithm. We establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) of this two-step technique for a fragment of temporal logic specifications. For specifications beyond this fragment, correctness can still be preserved, but the learned strategy may be sub-optimal. We present an algorithm to the overall problem, and demonstrate its use and computational requirements on a set of robot motion planning examples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes