LGOct 24, 2022

On Many-Actions Policy Gradient

arXiv:2210.13011v53.31 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This work addresses a specific problem in reinforcement learning for researchers and practitioners, offering an incremental improvement over existing many-actions SPG methods.

The paper tackles the variance issue in stochastic policy gradients (SPGs) when using many action samples per state, proposing Model-Based Many-Actions (MBMA) to reduce bias and maintain variance. As a result, MBMA achieves improved sample efficiency and higher returns on continuous action environments compared to baselines, with concrete gains in performance metrics.

We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.

View on arXiv PDF Code

Similar