LGSYNCMLSep 27, 2022

Reinforcement Learning with Non-Exponential Discounting

arXiv:2209.13413v216 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses the gap between RL and human decision-making models, enabling analysis of human discounting in sequential tasks, though it is incremental as it generalizes existing theory.

The authors tackled the problem of modeling non-exponential discounting in reinforcement learning, deriving a Hamilton-Jacobi-Bellman equation for optimal policies and validating it on simulated problems with an inverse RL approach to recover discount functions from data.

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes