AIJan 3, 2017

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

arXiv:1701.00867v1
Originality Incremental advance
AI Analysis

This work addresses a specific issue in reinforcement learning for policy gradient methods, but it appears incremental as it builds on existing baseline techniques.

The paper tackles the underfitting or overfitting problem in baseline estimation for policy gradient algorithms by developing a K-fold method that adjusts the bias-variance trade-off, demonstrating its usefulness on three MuJoCo locomotive control tasks with two state-of-the-art algorithms.

The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes