LGAILOSYMLSep 23, 2019

Modular Deep Reinforcement Learning with Temporal Logic Specifications

arXiv:1909.11591v245 citations
AI Analysis

This addresses the challenge of reward sparsity in RL for robotics or autonomous systems, though it appears incremental as it builds on existing DDPG methods with a modular approach.

The paper tackles the problem of sparse rewards with high-level temporal structure in continuous-state continuous-action MDPs by proposing a modular deep RL framework that uses a finite-state machine to guide the agent, resulting in a synthesized policy evaluated for success rate in a Mars rover experiment.

We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy. We evaluate our framework in a Mars rover experiment and we present the success rate of the synthesised policy.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes