LG AIJul 14, 2020

Efficient Empowerment Estimation for Unsupervised Stabilization

Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin

arXiv:2007.07356v210.613 citations

Originality Highly original

AI Analysis

This work addresses a key bottleneck in applying empowerment for intrinsically motivated learning in control systems, offering a more efficient and stable solution that could facilitate broader adoption in robotics and AI.

The paper tackles the challenge of efficiently estimating empowerment (mutual information between agent actuators and future states) for unsupervised stabilization in dynamical systems, proposing a trainable Gaussian channel representation that yields an unbiased estimator via convex optimization. The method demonstrates lower sample complexity, greater training stability, and the ability to estimate empowerment from images compared to existing variational lower bound approaches.

Intrinsically motivated artificial agents learn advantageous behavior without externally-provided rewards. Previously, it was shown that maximizing mutual information between agent actuators and future states, known as the empowerment principle, enables unsupervised stabilization of dynamical systems at upright positions, which is a prototypical intrinsically motivated behavior for upright standing and walking. This follows from the coincidence between the objective of stabilization and the objective of empowerment. Unfortunately, sample-based estimation of this kind of mutual information is challenging. Recently, various variational lower bounds (VLBs) on empowerment have been proposed as solutions; however, they are often biased, unstable in training, and have high sample complexity. In this work, we propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel, which allows us to efficiently calculate an unbiased estimator of empowerment by convex optimization. We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches. Specifically, we show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images. Consequently, our method opens a path to wider and easier adoption of empowerment for various applications.

View on arXiv PDF

Similar