Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey
This is a survey paper that synthesizes existing work on adversarial robustness in DRL, making it incremental in nature.
The paper tackles the problem of deep reinforcement learning (DRL) agents being vulnerable to minor condition variations, which hinders their reliability in real-world applications, by surveying adversarial attack and training methods to improve robustness.
Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.