LGFeb 15, 2021

Distributionally-Constrained Policy Optimization via Unbalanced Optimal Transport

arXiv:2102.07889v1
Originality Incremental advance
AI Analysis

This addresses constrained policy optimization for reinforcement learning practitioners, but appears incremental as it builds on existing optimal transport methods.

The paper tackles constrained policy optimization in reinforcement learning by formulating it as unbalanced optimal transport over occupancy measures, proposing a Bregman divergence-based objective optimized with Dykstra's algorithm, and demonstrating its effectiveness in applications.

We consider constrained policy optimization in Reinforcement Learning, where the constraints are in form of marginals on state visitations and global action executions. Given these distributions, we formulate policy optimization as unbalanced optimal transport over the space of occupancy measures. We propose a general purpose RL objective based on Bregman divergence and optimize it using Dykstra's algorithm. The approach admits an actor-critic algorithm for when the state or action space is large, and only samples from the marginals are available. We discuss applications of our approach and provide demonstrations to show the effectiveness of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes