LGAIFeb 11, 2023

UGAE: A Novel Approach to Non-exponential Discounting

arXiv:2302.05740v14 citationsh-index: 51
Originality Incremental advance
AI Analysis

This addresses a bottleneck in creating human-like agents for reinforcement learning by enabling non-exponential discounting in on-policy algorithms, though it is incremental as it builds on existing advantage estimation methods.

The paper tackles the problem of applying non-exponential discounting in reinforcement learning by proposing Universal Generalized Advantage Estimation (UGAE), which enables computation of advantage values with arbitrary discounting, and shows experimentally that agents trained with UGAE and Beta-weighted discounting outperform Monte Carlo baselines on standard benchmarks.

The discounting mechanism in Reinforcement Learning determines the relative importance of future and present rewards. While exponential discounting is widely used in practice, non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents. However, non-exponential discounting methods cannot be directly applied in modern on-policy actor-critic algorithms. To address this issue, we propose Universal Generalized Advantage Estimation (UGAE), which allows for the computation of GAE advantage values with arbitrary discounting. Additionally, we introduce Beta-weighted discounting, a continuous interpolation between exponential and hyperbolic discounting, to increase flexibility in choosing a discounting method. To showcase the utility of UGAE, we provide an analysis of the properties of various discounting methods. We also show experimentally that agents with non-exponential discounting trained via UGAE outperform variants trained with Monte Carlo advantage estimation. Through analysis of various discounting methods and experiments, we demonstrate the superior performance of UGAE with Beta-weighted discounting over the Monte Carlo baseline on standard RL benchmarks. UGAE is simple and easily integrated into any advantage-based algorithm as a replacement for the standard recursive GAE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes