LG AI OCJun 14, 2025

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Mingxuan Cui, Duo Zhou, Yuxuan Han, Grani A. Hanasusanto, Qiong Wang, Huan Zhang, Zhengyuan Zhou

arXiv:2506.12622v111.46 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses robustness issues in RL for real-world applications, but it is incremental as it builds on the existing SAC algorithm.

The authors tackled the problem of deep reinforcement learning's lack of robustness to environmental uncertainties by proposing DR-SAC, which achieved up to 9.8 times the average reward of the SAC baseline under perturbations and improved computing efficiency.

Deep reinforcement learning (RL) has achieved significant success, yet its application in real-world scenarios is often hindered by a lack of robustness to environmental uncertainties. To solve this challenge, some robust RL algorithms have been proposed, but most are limited to tabular settings. In this work, we propose Distributionally Robust Soft Actor-Critic (DR-SAC), a novel algorithm designed to enhance the robustness of the state-of-the-art Soft Actor-Critic (SAC) algorithm. DR-SAC aims to maximize the expected value with entropy against the worst possible transition model lying in an uncertainty set. A distributionally robust version of the soft policy iteration is derived with a convergence guarantee. For settings where nominal distributions are unknown, such as offline RL, a generative modeling approach is proposed to estimate the required nominal distributions from data. Furthermore, experimental results on a range of continuous control benchmark tasks demonstrate our algorithm achieves up to $9.8$ times the average reward of the SAC baseline under common perturbations. Additionally, compared with existing robust reinforcement learning algorithms, DR-SAC significantly improves computing efficiency and applicability to large-scale problems.

View on arXiv PDF Code

Similar