LGAIMay 30, 2022

Reinforcement Learning with a Terminator

NVIDIA
arXiv:2205.15376v25 citationsh-index: 81
Originality Incremental advance
AI Analysis

This addresses the problem of handling unpredictable interruptions in real-world RL applications like autonomous driving, though it is an incremental extension of the MDP framework.

The paper tackles reinforcement learning with exogenous termination by introducing the Termination Markov Decision Process (TerMDP) to model episodes interrupted by external observers, such as in autonomous driving. It presents a provably-efficient algorithm that shows fast convergence and significant improvement over baselines on driving and MinAtar benchmarks.

We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret. Motivated by our theoretical analysis, we design and implement a scalable approach, which combines optimism (w.r.t. termination) and a dynamic discount factor, incorporating the termination probability. We deploy our method on high-dimensional driving and MinAtar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes