A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning
This work addresses a specific bottleneck in distributional RL for researchers and practitioners by providing a more robust and interpretable loss function, though it is incremental as it builds upon existing quantile Huber loss methods.
The paper tackles the suboptimal and non-generalizable threshold parameter selection in distributional RL's quantile Huber loss by introducing a generalized loss derived from Wasserstein distance between Gaussian distributions, which enhances robustness against outliers and allows parameter adjustment based on data noise, validated empirically on Atari games and a hedging strategy.
Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal. This paper introduces a generalized quantile Huber loss function derived from Wasserstein distance (WD) calculation between Gaussian distributions, capturing noise in predicted (current) and target (Bellman-updated) quantile values. Compared to the classical quantile Huber loss, this innovative loss function enhances robustness against outliers. Notably, the classical Huber loss function can be seen as an approximation of our proposed loss, enabling parameter adjustment by approximating the amount of noise in the data during the learning process. Empirical tests on Atari games, a common application in distributional RL, and a recent hedging strategy using distributional RL, validate the effectiveness of our proposed loss function and its potential for parameter adjustments in distributional RL. The implementation of the proposed loss function is available here.