LG AIJun 19, 2021

Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling

Mengdi Xu, Peide Huang, Fengpei Li, Jiacheng Zhu, Xuewei Qi, Kentaro Oguchi, Zhiyuan Huang, Henry Lam, Ding Zhao

arXiv:2106.10566v28.45 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of scalable safety-critical policy evaluation for reinforcement learning systems, though it appears incremental as it builds on adaptive importance sampling techniques.

The paper tackles the challenge of evaluating rare but high-stakes events in reinforcement learning policies by proposing the Accelerated Policy Evaluation (APE) method, which estimates rare event probabilities with smaller bias using orders of magnitude fewer samples than baselines.

Evaluating rare but high-stakes events is one of the main challenges in obtaining reliable reinforcement learning policies, especially in large or infinite state/action spaces where limited scalability dictates a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. This paper proposes the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence property of APE in the tabular setting. Our empirical studies show that APE can estimate the rare event probability with a smaller bias while only using orders of magnitude fewer samples than baselines in multi-agent and single-agent environments.

View on arXiv PDF Code

Similar