AIOct 4, 2019

Discounted Reinforcement Learning Is Not an Optimization Problem

arXiv:1910.02140v357 citations
Originality Highly original
AI Analysis

This addresses a foundational issue in reinforcement learning theory for researchers, highlighting a critical limitation in standard methods.

The paper argues that discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks, leading to no optimal policy, and encourages adopting rigorous optimization approaches like maximizing average reward.

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It is not an optimization problem in its usual formulation, so when using function approximation there is no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and its connection to the average reward formulation. We encourage researchers to adopt rigorous optimization approaches, such as maximizing average reward, for reinforcement learning in continuing tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes