LGAIMLJul 7, 2020

Provably Safe PAC-MDP Exploration Using Analogies

arXiv:2007.03574v212 citations
AI Analysis

This work addresses the problem of safe exploration for reinforcement learning practitioners in safety-critical applications, offering a novel approach that overcomes limitations of existing methods.

The paper tackles the challenge of balancing exploration and safety in reinforcement learning for safety-critical domains by proposing the Analogous Safe-state Exploration (ASE) algorithm, which provably ensures safety during exploration in MDPs with unknown, stochastic dynamics and achieves near-optimal policies with improved sample efficiency.

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes