LGAIOct 25, 2023

Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

arXiv:2310.16546v39 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses a specific issue in reinforcement learning for AI agents, offering an incremental improvement over prior distribution-based algorithms.

The paper tackles the problem of biased exploration in distributional reinforcement learning caused by using estimated variance for optimism, and presents a novel algorithm that randomizes risk criterion to avoid one-sided risk tendencies, achieving convergence guarantees and outperforming existing methods in Atari 55 games.

Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes