LGAIFeb 12, 2018

Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning

arXiv:1802.03976v24 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of policy diversity or alignment in reinforcement learning, but it appears incremental as it applies an existing mathematical tool (Wasserstein distance) to a known problem without demonstrating broad SOTA improvements.

The paper tackles the problem of learning multiple distinct policies in reinforcement learning by introducing Wasserstein distance as a regularizer, resulting in a method that can either diversify policies or align them with a target distribution.

We describe an application of Wasserstein distance to Reinforcement Learning. The Wasserstein distance in question is between the distribution of mappings of trajectories of a policy into some metric space, and some other fixed distribution (which may, for example, come from another policy). Different policies induce different distributions, so given an underlying metric, the Wasserstein distance quantifies how different policies are. This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. Changing the sign of the regularisation parameter, one can learn a policy for which its trajectory mapping distribution is attracted to a given fixed distribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes