LGCRMLAug 17, 2018

Data Poisoning Attacks in Contextual Bandits

arXiv:1808.05760v272 citations
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in contextual bandits used in applications like online recommendation and adaptive medical treatment, but it is incremental as it builds on existing attack frameworks.

The paper tackles offline data poisoning attacks in contextual bandits, showing that an attacker can hijack the algorithm's behavior by slightly manipulating rewards to force it to pull a target arm for a target contextual vector, with experiments demonstrating the attack's efficiency.

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes