LGAIJun 3, 2025

A Differential Perspective on Distributional Reinforcement Learning

arXiv:2506.03333v15 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing per-step rewards in reinforcement learning for agents, representing an incremental extension of existing distributional RL techniques.

The authors extended distributional reinforcement learning from the discounted to the average-reward setting, developing quantile-based algorithms that learn long-run reward distributions and achieve competitive performance compared to non-distributional methods.

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a potentially-discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time-step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms consistently yield competitive performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run reward and return distributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes