LGAIJan 15, 2025

Average-Reward Soft Actor-Critic

arXiv:2501.09080v28 citationsh-index: 10
AI Analysis

This work addresses a gap in reinforcement learning for researchers and practitioners dealing with temporally-extended problems without discounting, though it appears incremental as it extends existing actor-critic frameworks to a less-explored setting.

The paper tackled the lack of deep reinforcement learning algorithms for entropy-regularized average-reward objectives by introducing an average-reward soft actor-critic algorithm, achieving superior performance compared to existing methods on standard benchmarks.

The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years for its ability to solve temporally-extended problems without relying on discounting. Meanwhile, in the discounted setting, algorithms with entropy regularization have been developed, leading to improvements over deterministic methods. Despite the distinct benefits of these approaches, deep RL algorithms for the entropy-regularized average-reward objective have not been developed. While policy-gradient based approaches have recently been presented for the average-reward literature, the corresponding actor-critic framework remains less explored. In this paper, we introduce an average-reward soft actor-critic algorithm to address these gaps in the field. We validate our method by comparing with existing average-reward algorithms on standard RL benchmarks, achieving superior performance for the average-reward criterion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes