On the Hardness of Robust Classification
This work addresses the computational and sample complexity of robust classification, providing theoretical insights into the hardness of adversarial robustness for machine learning practitioners, though it is incremental in building on existing PAC learning theory.
The paper investigates the feasibility of robust learning against adversarial attacks, showing that non-trivial concept classes cannot be robustly learned in distribution-free settings with single-bit perturbations, and monotone conjunctions cannot be robustly learned under uniform distribution with ω(log n) perturbations, but can be with O(log n) perturbations.
It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. In this paper we study the feasibility of robust learning from the perspective of computational learning theory, considering both sample and computational complexity. In particular, our definition of robust learnability requires polynomial sample complexity. We start with two negative results. We show that no non-trivial concept class can be robustly learned in the distribution-free setting against an adversary who can perturb just a single input bit. We show moreover that the class of monotone conjunctions cannot be robustly learned under the uniform distribution against an adversary who can perturb $ω(\log n)$ input bits. However if the adversary is restricted to perturbing $O(\log n)$ bits, then the class of monotone conjunctions can be robustly learned with respect to a general class of distributions (that includes the uniform distribution). Finally, we provide a simple proof of the computational hardness of robust learning on the boolean hypercube. Unlike previous results of this nature, our result does not rely on another computational model (e.g. the statistical query model) nor on any hardness assumption other than the existence of a hard learning problem in the PAC framework.