AILGLOJan 10, 2022

Verified Probabilistic Policies for Deep Reinforcement Learning

arXiv:2201.03698v28 citations
AI Analysis

This work addresses the need for safe and correct execution in deep reinforcement learning, particularly for applications like adversarial environments, but it is incremental as it builds on existing verification techniques for neural networks and dynamical systems.

The paper tackles the problem of formally verifying probabilistic policies in deep reinforcement learning, proposing an abstraction approach based on interval Markov decision processes that provides probabilistic guarantees on policy execution, with implementation and effectiveness demonstrated on reinforcement learning benchmarks.

Deep reinforcement learning is an increasingly popular technique for synthesising policies to control an agent's interaction with its environment. There is also growing interest in formally verifying that such policies are correct and execute safely. Progress has been made in this area by building on existing work for verification of deep neural networks and of continuous-state dynamical systems. In this paper, we tackle the problem of verifying probabilistic policies for deep reinforcement learning, which are used to, for example, tackle adversarial environments, break symmetries and manage trade-offs. We propose an abstraction approach, based on interval Markov decision processes, that yields probabilistic guarantees on a policy's execution, and present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking. We implement our approach and illustrate its effectiveness on a selection of reinforcement learning benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes