Adversarial Robustness: What fools you makes you stronger
This work addresses adversarial robustness in machine learning, offering a novel theoretical framework with potential implications for secure model design, though it is incremental in its simplified setting.
The paper tackles the problem of adversarial robustness by proving an exponential separation in sample complexity between PAC-learning and Equivalence-Query-learning, and introduces a strong adversary model to design an adaptive defense that outputs a classifier provably robust or with low error using exponentially fewer samples than PAC bounds.
We prove an exponential separation for the sample complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. We then show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust. In particular, we show how to realize this vision in a simplified setting. In order to do so, we introduce a notion of a strong adversary: he is not limited by the type of perturbations he can apply but when presented with a classifier can repetitively generate different adversarial examples. We explain why this notion is interesting to study and use it to prove the following. There exists an efficient adversarial-learning-like scheme such that for every strong adversary $\mathbf{A}$ it outputs a classifier that (a) cannot be strongly attacked by $\mathbf{A}$, or (b) has error at most $ε$. In both cases our scheme uses exponentially (in $ε$) fewer samples than what the PAC bound requires.