On Extensions of CLEVER: A Neural Network Robustness Evaluation Algorithm
This work addresses robustness evaluation for neural networks, particularly in adversarial settings, but is incremental as it builds on existing CLEVER methods.
The paper tackles the problem of evaluating neural network robustness by extending the CLEVER algorithm, proposing a second-order score with formal guarantees for twice-differentiable classifiers and adapting it to handle gradient masking via BPDA, achieving effective evaluation on a 121-layer Densenet on ImageNet.
CLEVER (Cross-Lipschitz Extreme Value for nEtwork Robustness) is an Extreme Value Theory (EVT) based robustness score for large-scale deep neural networks (DNNs). In this paper, we propose two extensions on this robustness score. First, we provide a new formal robustness guarantee for classifier functions that are twice differentiable. We apply extreme value theory on the new formal robustness guarantee and the estimated robustness is called second-order CLEVER score. Second, we discuss how to handle gradient masking, a common defensive technique, using CLEVER with Backward Pass Differentiable Approximation (BPDA). With BPDA applied, CLEVER can evaluate the intrinsic robustness of neural networks of a broader class -- networks with non-differentiable input transformations. We demonstrate the effectiveness of CLEVER with BPDA in experiments on a 121-layer Densenet model trained on the ImageNet dataset.