LGMay 29, 2025

Equivalence of stochastic and deterministic policy gradients

arXiv:2505.23244v22 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in reinforcement learning theory for researchers, but it is incremental as it builds on existing policy gradient derivations.

The paper tackled the relationship between stochastic and deterministic policy gradients in continuous control, showing they are identical in a specific MDP family and developing a general equivalence procedure, suggesting unification through state value function approximation.

Policy gradients in continuous control have been derived for both stochastic and deterministic policies. Here we study the relationship between the two. In a widely-used family of MDPs involving Gaussian control noise and quadratic control costs, we show that the stochastic and deterministic policy gradients, natural gradients, and state value functions are identical; while the state-control value functions are different. We then develop a general procedure for constructing an MDP with deterministic policy that is equivalent to a given MDP with stochastic policy. The controls of this new MDP are the sufficient statistics of the stochastic policy in the original MDP. Our results suggest that policy gradient methods can be unified by approximating state value functions rather than state-control value functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes