LGAIJul 17, 2023

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

arXiv:2307.08873v315 citationsh-index: 48
Originality Incremental advance
AI Analysis

This work addresses risk management in reinforcement learning for applications requiring stable and safe decision-making, but it is incremental as it substitutes one risk measure for another.

The paper tackled the limitations of variance-based risk measures in risk-averse reinforcement learning by proposing Gini deviation as an alternative, and their algorithm achieved high return with low risk in domains where others failed.

Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes