LGAIROJun 20, 2025

Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation

arXiv:2506.16753v1h-index: 3Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the challenge of inefficient training due to mutual dependencies in adversarial RL for scenarios like robotics or autonomous systems, though it is incremental as it builds on existing robust RL approaches.

The paper tackles the problem of adversarial observation robustness in reinforcement learning by proposing an off-policy method that reformulates adversarial learning as a soft-constrained optimization problem, eliminating the need for additional environmental interactions and achieving competitive performance with state-of-the-art methods in benchmarks.

Recently, robust reinforcement learning (RL) methods designed to handle adversarial input observations have received significant attention, motivated by RL's inherent vulnerabilities. While existing approaches have demonstrated reasonable success, addressing worst-case scenarios over long time horizons requires both minimizing the agent's cumulative rewards for adversaries and training agents to counteract them through alternating learning. However, this process introduces mutual dependencies between the agent and the adversary, making interactions with the environment inefficient and hindering the development of off-policy methods. In this work, we propose a novel off-policy method that eliminates the need for additional environmental interactions by reformulating adversarial learning as a soft-constrained optimization problem. Our approach is theoretically supported by the symmetric property of policy evaluation between the agent and the adversary. The implementation is available at https://github.com/nakanakakosuke/VALT_SAC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes