MLLGFeb 10, 2020

Discrete Action On-Policy Learning with Action-Value Critic

arXiv:2002.03534v25 citations
AI Analysis

This addresses efficiency issues in RL for real-world applications with discrete actions, though it appears incremental as it builds on existing variance control techniques.

The paper tackles the challenge of reinforcement learning in high-dimensional discrete action spaces, where complexity grows exponentially, by developing a new on-policy algorithm that uses an action-value critic to estimate functions and control gradient variance, resulting in empirical outperformance over related on-policy algorithms on OpenAI Gym benchmarks.

Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate these properties on OpenAI Gym benchmark tasks, and illustrate how discretizing the action space could benefit the exploration phase and hence facilitate convergence to a better local optimal solution thanks to the flexibility of discrete policy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes