LGMLMar 18, 2019

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

arXiv:1903.07438v245 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving efficiency and transferability in reinforcement learning for continuous control, though it appears incremental as it builds on existing KL-regularized frameworks.

The paper tackles the challenge of incorporating prior knowledge and exploiting reusable structure in reinforcement learning by using KL-regularized objectives with latent variables in policies and default behaviors, resulting in faster learning and transfer on continuous control tasks.

As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important. The KL-regularized expected reward objective constitutes one possible tool to this end. It introduces an additional component, a default or prior behavior, which can be learned alongside the policy and as such partially transforms the reinforcement learning problem into one of behavior modelling. In this work we consider the implications of this framework in cases where both the policy and default behavior are augmented with latent variables. We discuss how the resulting hierarchical structures can be used to implement different inductive biases and how their modularity can benefit transfer. Empirically we find that they can lead to faster learning and transfer on a range of continuous control tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes