LGAIJul 18, 2024

PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods

arXiv:2407.13146v2
Originality Incremental advance
AI Analysis

This addresses a bottleneck in reinforcement learning for game agents, but it is incremental as it combines existing techniques.

The paper tackles the sample inefficiency of policy gradient methods by integrating distributional reinforcement learning into Proximal Policy Optimization, showing improved decision-making in Atari-2600 games.

This paper introduces PG-Rainbow, a novel algorithm that incorporates a distributional reinforcement learning framework with a policy gradient algorithm. Existing policy gradient methods are sample inefficient and rely on the mean of returns when calculating the state-action value function, neglecting the distributional nature of returns in reinforcement learning tasks. To address this issue, we use an Implicit Quantile Network that provides the quantile information of the distribution of rewards to the critic network of the Proximal Policy Optimization algorithm. We show empirical results that through the integration of reward distribution information into the policy network, the policy agent acquires enhanced capabilities to comprehensively evaluate the consequences of potential actions in a given state, facilitating more sophisticated and informed decision-making processes. We evaluate the performance of the proposed algorithm in the Atari-2600 game suite, simulated via the Arcade Learning Environment (ALE).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes