LG SYJun 11, 2025

Wasserstein Barycenter Soft Actor-Critic

arXiv:2506.10167v37.11 citationsh-index: 10

Originality Incremental advance

AI Analysis

This addresses sample efficiency issues for researchers and practitioners in reinforcement learning, particularly in continuous control domains, but appears incremental as it builds on existing actor-critic frameworks.

The paper tackled the problem of poor sample efficiency in deep off-policy actor-critic algorithms for reinforcement learning in continuous control, especially with sparse rewards, by proposing the Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which showed improved sample efficiency on MuJoCo tasks compared to state-of-the-art methods.

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous control tasks.

View on arXiv PDF

Similar