Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning
This work addresses uncertainty-aware reinforcement learning for AI systems, but it is incremental as it refines existing methods with a more flexible error distribution.
The paper tackles inaccurate uncertainty estimation in temporal difference learning by introducing a generalized Gaussian error modeling framework that incorporates kurtosis, leading to significant performance gains in policy gradient algorithms.
Conventional uncertainty-aware temporal difference (TD) learning often assumes a zero-mean Gaussian distribution for TD errors, leading to inaccurate error representations and compromised uncertainty estimation. We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning to enhance the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis, thereby improving the estimation and mitigation of data-dependent aleatoric uncertainty. We examine the influence of the shape parameter of the generalized Gaussian distribution (GGD) on aleatoric uncertainty and provide a closed-form expression that demonstrates an inverse relationship between uncertainty and the shape parameter. Additionally, we propose a theoretically grounded weighting scheme to address epistemic uncertainty by fully leveraging the GGD. We refine batch inverse variance weighting with bias reduction and kurtosis considerations, enhancing robustness. Experiments with policy gradient algorithms demonstrate significant performance gains.