Model-Targeted Poisoning Attacks with Provable Convergence
This work addresses security vulnerabilities in machine learning systems for practitioners, offering a novel attack with provable guarantees, though it is incremental as it builds on prior model-targeted poisoning methods.
The authors tackled the problem of poisoning attacks on convex machine learning models by proposing an efficient attack that provably converges to any attainable target classifier, with the distance inversely proportional to the square root of the number of poisoning points and providing a lower bound on the minimum points needed.
In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a corrupted model that misbehaves in favor of the adversary. We consider poisoning attacks against convex machine learning models and propose an efficient poisoning attack designed to induce a specified model. Unlike previous model-targeted poisoning attacks, our attack comes with provable convergence to {\it any} attainable target classifier. The distance from the induced classifier to the target classifier is inversely proportional to the square root of the number of poisoning points. We also provide a lower bound on the minimum number of poisoning points needed to achieve a given target classifier. Our method uses online convex optimization, so finds poisoning points incrementally. This provides more flexibility than previous attacks which require a priori assumption about the number of poisoning points. Our attack is the first model-targeted poisoning attack that provides provable convergence for convex models, and in our experiments, it either exceeds or matches state-of-the-art attacks in terms of attack success rate and distance to the target model.