LGAIROSYNov 4, 2020

Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods

arXiv:2011.02073v5
AI Analysis

This work addresses exploration challenges in reinforcement learning for robotics, presenting an incremental improvement over traditional baselines.

The paper tackles the problem of improving exploration in policy gradient methods for deep reinforcement learning by introducing an optimal control-based baseline function, which is validated on robot learning tasks and shown to be effective in sparse reward environments.

In this paper, a novel optimal control-based baseline function is presented for the policy gradient method in deep reinforcement learning (RL). The baseline is obtained by computing the value function of an optimal control problem, which is formed to be closely associated with the RL task. In contrast to the traditional baseline aimed at variance reduction of policy gradient estimates, our work utilizes the optimal control value function to introduce a novel aspect to the role of baseline -- providing guided exploration during policy learning. This aspect is less discussed in prior works. We validate our baseline on robot learning tasks, showing its effectiveness in guided exploration, particularly in sparse reward environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes