RO FL LGSep 7, 2021

Safety-Critical Learning of Robot Control with Temporal Logic Specifications

arXiv:2109.02791v715.69 citations

Originality Incremental advance

AI Analysis

This work addresses safety-critical learning for robotic systems, offering a modular approach to handle complex tasks, though it appears incremental by combining existing techniques like LTL and ECBFs in a novel way.

The paper tackles the challenge of ensuring safe exploration and effective exploitation in reinforcement learning for robotic control with unknown models and uncertainties, by proposing a framework that integrates Linear Temporal Logic for task specification, a novel reward scheme with probabilistic guarantees, and Gaussian Processes with Exponential Control Barrier Functions for safe exploration, resulting in near-perfect success rates and high-probability safety during training.

Reinforcement learning (RL) is a promising approach. However, success is limited to real-world applications, because ensuring safe exploration and facilitating adequate exploitation is a challenge for controlling robotic systems with unknown models and measurement uncertainties. The learning problem becomes even more difficult for complex tasks over continuous state-action. In this paper, we propose a learning-based robotic control framework consisting of several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex tasks over infinite horizons that are translated to a novel automaton structure; (2) we detail an innovative reward scheme for LTL satisfaction with a probabilistic guarantee. Then, by applying a reward shaping technique, we develop a modular policy-gradient architecture exploiting the benefits of the automaton structure to decompose overall tasks and enhance the performance of learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safe exploration during the learning process using Exponential Control Barrier Functions (ECBFs) that generalize systems with high-order relative degrees; (4) to further improve the efficiency of exploration, we utilize the properties of LTL automata and ECBFs to propose a safe guiding process. Finally, we demonstrate the effectiveness of the framework via several robotic environments. We show an ECBF-based modular deep RL algorithm that achieves near-perfect success rates and safety guarding with high probability confidence during training.

View on arXiv PDF

Similar