LGOct 8, 2023

DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

arXiv:2310.05179v5h-index: 1
Originality Incremental advance
AI Analysis

It addresses the problem of achieving reliable policies in safety-critical RL applications, offering a flexible and explainable framework, though it appears incremental by unifying existing risk adaption approaches.

The paper tackles the challenge of dynamically adjusting epistemic risk in reinforcement learning for safety-critical settings, proposing DRL-ORA, which quantifies uncertainties and adapts risk levels online, outperforming fixed or manual methods in multiple tasks.

One of the main challenges in reinforcement learning (RL) is that the agent has to make decisions that would influence the future performance without having complete knowledge of the environment. Dynamically adjusting the level of epistemic risk during the learning process can help to achieve reliable policies in safety-critical settings with better efficiency. In this work, we propose a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA). This framework quantifies both epistemic and implicit aleatory uncertainties in a unified manner and dynamically adjusts the epistemic risk levels by solving a total variation minimization problem online. The framework unifies the existing variants of risk adaption approaches and offers better explainability and flexibility. The selection of risk levels is performed efficiently via a grid search using a Follow-The-Leader-type algorithm, where the offline oracle also corresponds to a ''satisficing measure'' under a specially modified loss function. We show that DRL-ORA outperforms existing methods that rely on fixed risk levels or manually designed risk level adaptation in multiple classes of tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes