LG MLSep 26, 2020

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri

arXiv:2009.12612v219.588 citationsHas Code

Originality Highly original

AI Analysis

This addresses the problem of safe exploration in continuous state-action spaces for RL applications, offering a novel method to bypass verification bottlenecks, though it builds incrementally on prior neurosymbolic and safe RL techniques.

The paper tackles the challenge of provably safe exploration in deep reinforcement learning by introducing Revel, a neurosymbolic RL framework that avoids computationally infeasible neural network verification. The results show that Revel enforces safe exploration where prior methods fail and discovers policies that outperform existing verified exploration approaches.

We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.

View on arXiv PDF Code

Similar