LGOct 26, 2023

DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic

arXiv:2310.17173v21 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of safe deployment of reinforcement learning agents in real-world scenarios, though it appears incremental as an extension to existing SAC algorithms.

The paper tackles improving discrete Soft Actor-Critic by adding statistical constraints based on the Maximum Entropy Principle, resulting in enhanced robustness against domain shifts, as shown in empirical tests on Atari 2600 games.

We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate critic policy. Furthermore, our findings suggests that these constraints provide an added robustness against potential domain shifts, which are essential for safe deployment of reinforcement learning agents in the real-world. We provide theoretical analysis and show empirical results on low data regimes for both in-distribution and out-of-distribution variants of Atari 2600 games.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes