LGJun 11, 2021

Taylor Expansion of Discount Factors

arXiv:2106.06170v28 citations
Originality Incremental advance
AI Analysis

This work tackles a practical issue in RL for algorithm developers, offering incremental improvements to policy optimization methods.

The paper addresses the discrepancy between discount factors used for value function estimation and evaluation objectives in reinforcement learning, proposing a family of interpolated objectives that lead to empirical performance gains.

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes