LGMLOct 7, 2020

Variational Intrinsic Control Revisited

arXiv:2010.03281v212 citations
Originality Incremental advance
AI Analysis

This work addresses a specific issue in unsupervised reinforcement learning for researchers, but it is incremental as it builds directly on prior VIC methods.

The paper tackled the problem of bias in the intrinsic reward of variational intrinsic control (VIC) in stochastic environments, which leads to suboptimal solutions, and proposed two correction methods based on transitional probability and Gaussian mixture models to achieve maximal empowerment.

In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly. We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions. To correct this behavior and achieve the maximal empowerment, we propose two methods respectively based on the transitional probability model and Gaussian mixture model. We substantiate our claims through rigorous mathematical derivations and experimental analyses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes