Adversarially Robust Learning with Tolerance
This work addresses the gap between theoretical guarantees and practical methods in adversarial machine learning, offering new bounds for robust learning with tolerance, though it is incremental in extending existing adversarial learning frameworks.
The paper tackles the problem of adversarial PAC-learning by introducing a tolerant version that compares learner error to the best achievable with a slightly larger perturbation radius, bridging theory and practice. It provides the first PAC-type guarantees for popular techniques like perturb-and-smooth, showing sample complexities of O(v(1+1/γ)^{O(d)}/ε) and Õ(d·v·log(1+1/γ)/ε²) in the agnostic setting, in contrast to failures in the non-tolerant setting.
We initiate the study of tolerant adversarial PAC-learning with respect to metric perturbation sets. In adversarial PAC-learning, an adversary is allowed to replace a test point $x$ with an arbitrary point in a closed ball of radius $r$ centered at $x$. In the tolerant version, the error of the learner is compared with the best achievable error with respect to a slightly larger perturbation radius $(1+γ)r$. This simple tweak helps us bridge the gap between theory and practice and obtain the first PAC-type guarantees for algorithmic techniques that are popular in practice. Our first result concerns the widely-used ``perturb-and-smooth'' approach for adversarial learning. For perturbation sets with doubling dimension $d$, we show that a variant of these approaches PAC-learns any hypothesis class $\mathcal{H}$ with VC-dimension $v$ in the $γ$-tolerant adversarial setting with $O\left(\frac{v(1+1/γ)^{O(d)}}{\varepsilon}\right)$ samples. This is in contrast to the traditional (non-tolerant) setting in which, as we show, the perturb-and-smooth approach can provably fail. Our second result shows that one can PAC-learn the same class using $\widetilde{O}\left(\frac{d.v\log(1+1/γ)}{\varepsilon^2}\right)$ samples even in the agnostic setting. This result is based on a novel compression-based algorithm, and achieves a linear dependence on the doubling dimension as well as the VC-dimension. This is in contrast to the non-tolerant setting where there is no known sample complexity upper bound that depend polynomially on the VC-dimension.