LGAIDec 30, 2022

Risk-Sensitive Policy with Distributional Reinforcement Learning

arXiv:2212.14743v114 citationsh-index: 43
Originality Highly original
AI Analysis

This work addresses the need for risk-aware policies in critical applications where classical RL methods ignore potential risks, offering a practical and interpretable solution.

The paper tackles the problem of risk-sensitive decision-making in reinforcement learning by introducing a novel methodology based on distributional RL, which enables learning policies that balance risk minimization and expected return maximization through a risk-based utility function.

Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the $Q$ function generally standing at the core of learning schemes in RL by another function taking into account both the expected return and the risk. Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm. This enables to span the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, and with an emphasis on the interpretability of the resulting decision-making process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes