LGRONov 18, 2020

Weighted Entropy Modification for Soft Actor-Critic

arXiv:2011.09083v1
AI Analysis

This work addresses the problem of exploration-exploitation balance in reinforcement learning for researchers and practitioners, offering an incremental improvement to existing methods.

This paper generalizes the maximum Shannon entropy principle in reinforcement learning to weighted entropy, using qualitative weights for state-action pairs. The proposed algorithm, motivated by self-balancing exploration, achieves state-of-the-art performance on Mujoco tasks.

We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights, which can be connected with prior knowledge, experience replay, and evolution process of the policy. We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes