LG AIMay 16, 2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara

arXiv:2205.07885v11.81 citationsh-index: 26

Originality Incremental advance

AI Analysis

This addresses the tradeoff between flexibility and performance in reinforcement learning for researchers and practitioners, but it is incremental as it builds on existing frameworks like Munchausen DQN.

The paper tackled the underperformance of non-Shannon entropies in reinforcement learning due to approximation errors by proposing Tsallis Advantage Learning (TAL), which enforces implicit KL regularization to improve error-robustness, resulting in significant improvements over Tsallis-DQN and comparable performance to state-of-the-art Shannon entropy algorithms.

Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form Tsallis entropies, but also exhibits comparable performance to state-of-the-art maximum Shannon entropy algorithms.

View on arXiv PDF

Similar