LGMLAug 9, 2018

Policy Optimization as Wasserstein Gradient Flows

arXiv:1808.03030v117.275 citations
Originality Incremental advance
AI Analysis

This work addresses the foundational problem of understanding and improving policy optimization in reinforcement learning for researchers and practitioners, though it appears incremental as it builds on existing algorithms.

The authors tackled the unclear mathematical principles of policy optimization in reinforcement learning by interpreting it as Wasserstein gradient flows in probability-measure space, which makes it convex under certain conditions, and they developed efficient algorithms that empirically achieved better performance compared to related methods.

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its underlying mathematical principle on {\em policy-distribution} optimization is unclear. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes a convex problem in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Empirical results verify the effectiveness of our framework, often obtaining better performance compared to related algorithms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes