LGAIROSYJul 3, 2021

Examining average and discounted reward optimality criteria in reinforcement learning

arXiv:2107.01348v224 citations
AI Analysis

This work addresses foundational criteria selection in RL, offering insights for researchers and practitioners dealing with environments lacking natural discounting, though it is incremental in nature.

The paper revisits average and discounted reward optimality criteria in reinforcement learning, highlighting issues with artificial discount factors in non-discounting environments and advocating for average-reward methods as discounting-free alternatives.

In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it is problematic to apply in environments without an inherent notion of discounting. This motivates us to revisit a) the progression of optimality criteria in dynamic programming, b) justification for and complication of an artificial discount factor, and c) benefits of directly maximizing the average reward criterion, which is discounting-free. Our contributions include a thorough examination of the relationship between average and discounted rewards, as well as a discussion of their pros and cons in RL. We emphasize that average-reward RL methods possess the ingredient and mechanism for applying a family of discounting-free optimality criteria (Veinott, 1969) to RL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes