Model-Free Learning of Safe yet Effective Controllers
This addresses the need for safe and efficient controllers in robotics or autonomous systems, but appears incremental as it builds on existing RL and LTL methods.
The paper tackles the problem of learning safe and effective control policies in unknown Markov decision processes, proposing a model-free reinforcement learning algorithm that prioritizes safety, task satisfaction via linear temporal logic, and control performance, with applicability demonstrated.
We study the problem of learning safe control policies that are also effective; i.e., maximizing the probability of satisfying a linear temporal logic (LTL) specification of a task, and the discounted reward capturing the (classic) control performance. We consider unknown environments modeled as Markov decision processes. We propose a model-free reinforcement learning algorithm that learns a policy that first maximizes the probability of ensuring safety, then the probability of satisfying the given LTL specification and lastly, the sum of discounted Quality of Control rewards. Finally, we illustrate applicability of our RL-based approach.