LGAIMay 27, 2025

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

arXiv:2505.20621v14 citationsh-index: 28ICLR
Originality Highly original
AI Analysis

This work addresses safety and reliability concerns in offline RL for applications relying on externally sourced datasets, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the vulnerability of offline reinforcement learning to poisoning attacks by extending certified defenses to ensure robustness for both per-state actions and overall expected cumulative reward, achieving performance drops of no more than 50% with up to 7% of training data poisoned, a significant improvement over prior work's 0.008%.

Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments -- significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than $50\%$ with up to $7\%$ of the training data poisoned, significantly improving over the $0.008\%$ in prior work~\citep{wu_copa_2022}, while producing certified radii that is $5$ times larger as well. This highlights the potential of our framework to enhance safety and reliability in offline RL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes