Generalizable Targeted Data Poisoning against Varying Physical Objects
This addresses a real-world security threat for machine learning models by improving the robustness of targeted data poisoning attacks against physical variations, though it is incremental over prior work.
The paper tackles the problem of targeted data poisoning's limited generalizability across varying physical conditions like viewpoint and lighting, and proposes optimizing both gradient direction and magnitude to achieve a 19.49% higher poisoning success rate on CIFAR-10 targeting multi-view cars.
Targeted data poisoning (TDP) aims to compromise the model's prediction on a specific (test) target by perturbing a small subset of training data. Existing work on TDP has focused on an overly ideal threat model in which the same image sample of the target is used during both poisoning and inference stages. However, in the real world, a target object often appears in complex variations due to changes of physical settings such as viewpoint, background, and lighting conditions. In this work, we take the first step toward understanding the real-world threats of TDP by studying its generalizability across varying physical conditions. In particular, we observe that solely optimizing gradient directions, as adopted by the best previous TDP method, achieves limited generalization. To address this limitation, we propose optimizing both the gradient direction and magnitude for more generalizable gradient matching, thereby leading to higher poisoning success rates. For instance, our method outperforms the state of the art by 19.49% when poisoning CIFAR-10 images targeting multi-view cars.