LGSYMLJan 17, 2025

Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

arXiv:2501.10605v24 citationsh-index: 10L4DC
Originality Highly original
AI Analysis

This addresses stability issues in deep reinforcement learning for practitioners, though it appears incremental as it builds on existing actor-critic methods with a novel regularization approach.

The paper tackles instability in actor-critic reinforcement learning by introducing Wasserstein Adaptive Value Estimation (WAVE), which uses adaptive Wasserstein regularization in the critic's loss, achieving an O(1/k) convergence rate for mean squared error and superior performance in experiments.

We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes