LG SY MLJan 17, 2025

Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

Ali Baheri, Zahra Shahrooei, Chirayu Salgarkar

arXiv:2501.10605v29.44 citationsh-index: 10L4DC

Originality Highly original

AI Analysis

This addresses stability issues in deep reinforcement learning for practitioners, though it appears incremental as it builds on existing actor-critic methods with a novel regularization approach.

The paper tackles instability in actor-critic reinforcement learning by introducing Wasserstein Adaptive Value Estimation (WAVE), which uses adaptive Wasserstein regularization in the critic's loss, achieving an O(1/k) convergence rate for mean squared error and superior performance in experiments.

We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.

View on arXiv PDF

Similar