LG CRJun 9, 2017

Certified Defenses for Data Poisoning Attacks

Jacob Steinhardt, Pang Wei Koh, Percy Liang

arXiv:1706.03691v2866 citations

Originality Incremental advance

AI Analysis

This work addresses the security of ML systems against malicious data manipulation, providing a tool to assess defense robustness, though it is incremental as it builds on existing outlier removal methods.

The paper tackles the problem of data poisoning attacks on machine learning systems by constructing approximate upper bounds on the loss for defenders using outlier removal and empirical risk minimization, finding that datasets like MNIST-1-7 and Dogfish are resilient while IMDB sentiment can be driven from 12% to 23% test error with 3% poisoned data.

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

View on arXiv PDF

Similar