Binary Classification Under $\ell_0$ Attacks for General Noise Distribution
This work addresses adversarial robustness in machine learning for scenarios with ℓ₀ attacks, providing theoretical guarantees and a phase transition result, though it is incremental as it builds on existing robust classification frameworks.
The paper tackles the problem of binary classification under adversarial attacks constrained by the ℓ₀ norm, where the adversary can perturb a limited number of coordinates without bound. It introduces a classification method with truncation and shows that, asymptotically, if the adversary perturbs no more than √d samples, the method achieves near-optimal error, effectively neutralizing the attack, while beyond this threshold, no classifier outperforms random guessing.
Adversarial examples have recently drawn considerable attention in the field of machine learning due to the fact that small perturbations in the data can result in major performance degradation. This phenomenon is usually modeled by a malicious adversary that can apply perturbations to the data in a constrained fashion, such as being bounded in a certain norm. In this paper, we study this problem when the adversary is constrained by the $\ell_0$ norm; i.e., it can perturb a certain number of coordinates in the input, but has no limit on how much it can perturb those coordinates. Due to the combinatorial nature of this setting, we need to go beyond the standard techniques in robust machine learning to address this problem. We consider a binary classification scenario where $d$ noisy data samples of the true label are provided to us after adversarial perturbations. We introduce a classification method which employs a nonlinear component called truncation, and show in an asymptotic scenario, as long as the adversary is restricted to perturb no more than $\sqrt{d}$ data samples, we can almost achieve the optimal classification error in the absence of the adversary, i.e. we can completely neutralize adversary's effect. Surprisingly, we observe a phase transition in the sense that using a converse argument, we show that if the adversary can perturb more than $\sqrt{d}$ coordinates, no classifier can do better than a random guess.