An Analysis of Robustness of Non-Lipschitz Networks
This work addresses robustness in deep learning for security-critical applications, though it is incremental as it builds on existing adversarial attack models with new theoretical insights.
The paper tackles the vulnerability of deep networks to adversarial attacks by analyzing a model where adversaries can move data arbitrarily in random low-dimensional subspaces of feature space. It proves such adversaries are powerful but can be overcome by allowing algorithms to abstain on unusual inputs, with theoretical guarantees for optimizing accuracy-abstention trade-offs and empirical demonstrations in contrastive learning.
Despite significant advances, deep networks remain highly susceptible to adversarial attack. One fundamental challenge is that small input perturbations can often produce large movements in the network's final-layer feature space. In this paper, we define an attack model that abstracts this challenge, to help understand its intrinsic properties. In our model, the adversary may move data an arbitrary distance in feature space but only in random low-dimensional subspaces. We prove such adversaries can be quite powerful: defeating any algorithm that must classify any input it is given. However, by allowing the algorithm to abstain on unusual inputs, we show such adversaries can be overcome when classes are reasonably well-separated in feature space. We further provide strong theoretical guarantees for setting algorithm parameters to optimize over accuracy-abstention trade-offs using data-driven methods. Our results provide new robustness guarantees for nearest-neighbor style algorithms, and also have application to contrastive learning, where we empirically demonstrate the ability of such algorithms to obtain high robust accuracy with low abstention rates. Our model is also motivated by strategic classification, where entities being classified aim to manipulate their observable features to produce a preferred classification, and we provide new insights into that area as well.