On the Perturbed States for Transformed Input-robust Reinforcement Learning
This work addresses the problem of robust RL deployment for agents in real-world scenarios, but it is incremental as it explores a specific avenue of input transformations rather than a fundamental breakthrough.
The paper tackles the vulnerability of reinforcement learning agents to adversarial perturbations in input observations by proposing Transformed Input-robust RL (TIRL), which uses input transformation-based defenses like vector quantization to reconstruct states and achieve close transformed inputs, resulting in defense against several adversaries in MuJoCo environments.
Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as Transformed Input-robust RL (TIRL), which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: (1) autoencoder-styled denoising to reconstruct the original state and (2) bounded transformations (bit-depth reduction and vector quantization (VQ)) to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses, i.e., VQ, defend against several adversaries in the state observations. The official code is available at https://github.com/tunglm2203/tirl