Adversarially Robust Learning with Unknown Perturbation Sets
This work addresses the problem of learning robust models for practitioners who face unknown adversarial threats, providing theoretical bounds on the resources required.
This paper investigates learning robust predictors when the set of possible adversarial perturbations is unknown. The authors establish upper bounds on sample complexity and both upper and lower bounds on the number of interactions (successful attacks) needed, relating these to the VC and Littlestone dimensions of the hypothesis class.
We study the problem of learning predictors that are robust to adversarial examples with respect to an unknown perturbation set, relying instead on interaction with an adversarial attacker or access to attack oracles, examining different models for such interactions. We obtain upper bounds on the sample complexity and upper and lower bounds on the number of required interactions, or number of successful attacks, in different interaction models, in terms of the VC and Littlestone dimensions of the hypothesis class of predictors, and without any assumptions on the perturbation set.