MLFeb 22, 2018

An Analysis of Categorical Distributional Reinforcement Learning

arXiv:1802.08163v1123 citations
Originality Incremental advance
AI Analysis

This work provides foundational theoretical insights for researchers in reinforcement learning, addressing a gap in understanding CDRL methods that have shown state-of-the-art empirical performance.

The paper tackles the lack of theoretical understanding of categorical distributional reinforcement learning (CDRL) algorithms, such as C51, by introducing an analytical framework that proves convergence for sample-based CDRL algorithms and connects them to the Cramér distance.

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramér distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes