LGJan 12

Reward-Preserving Attacks For Robust Reinforcement Learning

arXiv:2601.07118v1h-index: 12
Originality Incremental advance
AI Analysis

This addresses adversarial robustness in RL, which is crucial for deploying RL in real-world, safety-critical applications, though it is an incremental improvement over existing methods.

The paper tackles the challenge of adversarial robustness in reinforcement learning by proposing α-reward-preserving attacks that adapt adversary strength to preserve a fraction of the nominal-to-worst-case return gap, improving robustness across radii while maintaining nominal performance.

Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose $α$-reward-preserving attacks, which adapt the strength of the adversary so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude $η\le η_{\mathcal B}$ selected via a critic $Q^π_α((s,a),η)$ trained off-policy over diverse radii. This adaptive tuning calibrates attack strength and, with intermediate $α$, improves robustness across radii while preserving nominal performance, outperforming fixed- and random-radius baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes