LGMLJun 4, 2020

Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

arXiv:2006.03465v116 citations
AI Analysis

This addresses the problem of transferring learned policies across visual domains in reinforcement learning, which is incremental as it builds on prior methods with a novel objective.

The paper tackles visual transfer in reinforcement learning by introducing WAPPO, which aligns feature distributions between source and target tasks using a Wasserstein Confusion objective, achieving state-of-the-art performance across Visual Cartpole and 16 OpenAI Procgen environments.

We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and two instantiations of 16 OpenAI Procgen environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes