AIDec 2, 2022

STL-Based Synthesis of Feedback Controllers Using Reinforcement Learning

arXiv:2212.01022v19.013 citationsh-index: 19Has Code

Originality Highly original

AI Analysis

This work addresses a critical bottleneck in applying reinforcement learning to cyber-physical systems with safety and liveness requirements, offering a systematic solution for controller synthesis.

The paper tackles the problem of designing reward functions for reinforcement learning agents to satisfy complex temporal logic specifications, proposing a new quantitative semantics for Signal Temporal Logic (STL) that enables real-time reward generation. Experimental results on continuous control benchmarks show this semantics is the most suitable for synthesizing feedback controllers, establishing its efficacy compared to existing methods.

Deep Reinforcement Learning (DRL) has the potential to be used for synthesizing feedback controllers (agents) for various complex systems with unknown dynamics. These systems are expected to satisfy diverse safety and liveness properties best captured using temporal logic. In RL, the reward function plays a crucial role in specifying the desired behaviour of these agents. However, the problem of designing the reward function for an RL agent to satisfy complex temporal logic specifications has received limited attention in the literature. To address this, we provide a systematic way of generating rewards in real-time by using the quantitative semantics of Signal Temporal Logic (STL), a widely used temporal logic to specify the behaviour of cyber-physical systems. We propose a new quantitative semantics for STL having several desirable properties, making it suitable for reward generation. We evaluate our STL-based reinforcement learning mechanism on several complex continuous control benchmarks and compare our STL semantics with those available in the literature in terms of their efficacy in synthesizing the controller agent. Experimental results establish our new semantics to be the most suitable for synthesizing feedback controllers for complex continuous dynamical systems through reinforcement learning.

View on arXiv PDF Code

Similar