LG CR MLAug 17, 2018

Data Poisoning Attacks in Contextual Bandits

Yuzhe Ma, Kwang-Sung Jun, Lihong Li, Xiaojin Zhu

arXiv:1808.05760v216.972 citations

Originality Incremental advance

AI Analysis

This addresses security vulnerabilities in contextual bandits used in applications like online recommendation and adaptive medical treatment, but it is incremental as it builds on existing attack frameworks.

The paper tackles offline data poisoning attacks in contextual bandits, showing that an attacker can hijack the algorithm's behavior by slightly manipulating rewards to force it to pull a target arm for a target contextual vector, with experiments demonstrating the attack's efficiency.

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.

View on arXiv PDF

Similar