LGAIMLJul 26, 2019

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

arXiv:1907.12392v533 citations
AI Analysis

This work addresses the challenge of guiding reinforcement learning agents to find good early solutions in complex environments, which is incremental as it builds on existing empowerment and reward maximization methods.

The paper tackles the problem of combining empowerment, an information-theoretic intrinsic motivation, with extrinsic reward signals in reinforcement learning, proposing a unified Bellman optimality principle that improves initial performance and achieves competitive final results in high-dimensional robotics tasks.

Empowerment is an information-theoretic method that can be used to intrinsically motivate learning agents. It attempts to maximize an agent's control over the environment by encouraging visiting states with a large number of reachable next states. Empowered learning has been shown to lead to complex behaviors, without requiring an explicit reward signal. In this paper, we investigate the use of empowerment in the presence of an extrinsic reward signal. We hypothesize that empowerment can guide reinforcement learning (RL) agents to find good early behavioral solutions by encouraging highly empowered states. We propose a unified Bellman optimality principle for empowered reward maximization. Our empowered reward maximization approach generalizes both Bellman's optimality principle as well as recent information-theoretical extensions to it. We prove uniqueness of the empowered values and show convergence to the optimal solution. We then apply this idea to develop off-policy actor-critic RL algorithms which we validate in high-dimensional continuous robotics domains (MuJoCo). Our methods demonstrate improved initial and competitive final performance compared to model-free state-of-the-art techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes