LGAIMLFeb 21, 2018

Variational Inference for Policy Gradient

arXiv:1802.07833v24 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of uncertainty estimation in reinforcement learning policies, but it is incremental as it builds on existing Stein Variational Inference and policy gradient methods.

The authors tackled the problem of improving policy gradient methods in reinforcement learning by applying variational inference to generate samples from posterior parameter distributions, resulting in enhanced performance across vanilla policy gradient, TRPO, and PPO algorithms with Bayesian Neural Network parameterizations.

Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes