LG MLJun 6, 2019

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas

arXiv:1906.03063v11.81 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a theoretical gap for researchers in reinforcement learning, but appears incremental as it modifies an existing framework without broad empirical validation.

The authors tackled the problem of aligning policy gradient methods with Bellman's principle of optimality in finite-horizon episodic Markov decision processes by proposing a new objective function and deriving its gradient expression.

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

View on arXiv PDF

Similar