LGNov 4, 2021

Model-Free Risk-Sensitive Reinforcement Learning

arXiv:2111.02907v111 citations
Originality Incremental advance
AI Analysis

This work addresses risk-sensitive reinforcement learning for decision-making under uncertainty, but it appears incremental as it builds on existing temporal-difference methods.

The paper tackles the problem of enabling risk-sensitive decision-making in reinforcement learning by extending temporal-difference learning to estimate the Gaussian free energy from samples, resulting in a model-free algorithm that accounts for both mean and variance.

We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be either the event of over- or underestimating the TD target. As a result, one obtains a stochastic approximation rule for estimating the free energy from i.i.d. samples generated by a Gaussian distribution with unknown mean and variance. Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes