LGFeb 8, 2024

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

arXiv:2402.05766v12.61 citationsh-index: 64

Originality Incremental advance

AI Analysis

This addresses a specific challenge in reinforcement learning for AI systems, but it appears incremental as it builds on existing distributional methods.

The paper tackles the problem of off-policy distributional reinforcement learning by introducing a new algorithm that avoids importance sampling, and it shows promising results on deep RL benchmarks when combined with the C51 agent.

We introduce off-policy distributional Q($λ$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($λ$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($λ$) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q($λ$) and validate theoretical insights with tabular experiments. We show how distributional Q($λ$)-C51, a combination of Q($λ$) with the C51 agent, exhibits promising results on deep RL benchmarks.

View on arXiv PDF

Similar