MLLGOct 1, 2021

A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning

arXiv:2110.00535v29 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights for researchers in distributional reinforcement learning, but it is incremental as it builds on existing quantile regression methods without introducing a new paradigm.

The paper tackles the problem of connecting different loss functions in distributional reinforcement learning by proving that the Cramér distance projection matches the 1-Wasserstein one and that, under non-crossing constraints, its squared loss and quantile regression loss have collinear gradients, while also proposing a low-complexity algorithm to compute the Cramér distance.

Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR) based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramér distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramér and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramér distance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes