Adversarial Perturbation Intensity Achieving Chosen Intra-Technique Transferability Level for Logistic Regression
This work addresses adversarial vulnerability in machine learning models, specifically for attackers with known model specifications but unknown training data, and is incremental as it builds on existing logistic regression methods.
The authors tackled the problem of adversarial attacks on logistic regression by deriving a closed-form expression for perturbation intensity to achieve a desired misclassification rate, achieving up to 95% success in evaluations on real-world datasets.
Machine Learning models have been shown to be vulnerable to adversarial examples, ie. the manipulation of data by a attacker to defeat a defender's classifier at test time. We present a novel probabilistic definition of adversarial examples in perfect or limited knowledge setting using prior probability distributions on the defender's classifier. Using the asymptotic properties of the logistic regression, we derive a closed-form expression of the intensity of any adversarial perturbation, in order to achieve a given expected misclassification rate. This technique is relevant in a threat model of known model specifications and unknown training data. To our knowledge, this is the first method that allows an attacker to directly choose the probability of attack success. We evaluate our approach on two real-world datasets.