LGAIOct 23, 2020

Towards Safe Policy Improvement for Non-Stationary MDPs

arXiv:2010.12645v243 citations
AI Analysis

This work addresses safety in critical systems with financial and human-life risks for domains where non-stationarity is present, representing an incremental step by extending Seldonian algorithms to non-stationary settings.

The paper tackles the problem of ensuring safe policy improvement in non-stationary Markov decision processes (MDPs), where existing methods assume stationarity, by proposing a method that combines model-free reinforcement learning with time-series analysis to provide high-confidence safety guarantees using sequential hypothesis testing and wild bootstrap confidence intervals.

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes