Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems
This work addresses security vulnerabilities in interactive recommender systems, which is an incremental improvement for the domain of adversarial robustness in AI.
The paper tackles the problem of detecting adversarial attacks in reinforcement learning-based interactive recommender systems by proposing an attack-agnostic detection method, showing that adversarial attacks are often effective and that attack strength and frequency impact performance, with strategically-timed attacks achieving comparable results at reduced frequency.
Adversarial attacks pose significant challenges for detecting adversarial attacks at an early stage. We propose attack-agnostic detection on reinforcement learning-based interactive recommendation systems. We first craft adversarial examples to show their diverse distributions and then augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data. Finally, we study the attack strength and frequency of adversarial examples and evaluate our model on standard datasets with multiple crafting methods. Our extensive experiments show that most adversarial attacks are effective, and both attack strength and attack frequency impact the attack performance. The strategically-timed attack achieves comparative attack performance with only 1/3 to 1/2 attack frequency. Besides, our black-box detector trained with one crafting method has the generalization ability over several crafting methods.