Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models
This work addresses the challenge of efficiently training robust models for practitioners, though it is incremental as it builds upon existing interval arithmetic methods.
The paper tackles the problem of training classification networks that are verifiably robust against adversarial attacks by introducing an additional term in the cost function to encourage small interval bounds at hidden layers, resulting in comparable or better performance with fewer training iterations and improved stability.
We present an efficient technique, which allows to train classification networks which are verifiably robust against norm-bounded adversarial attacks. This framework is built upon the work of Gowal et al., who applies the interval arithmetic to bound the activations at each layer and keeps the prediction invariant to the input perturbation. While that method is faster than competitive approaches, it requires careful tuning of hyper-parameters and a large number of epochs to converge. To speed up and stabilize training, we supply the cost function with an additional term, which encourages the model to keep the interval bounds at hidden layers small. Experimental results demonstrate that we can achieve comparable (or even better) results using a smaller number of training iterations, in a more stable fashion. Moreover, the proposed model is not so sensitive to the exact specification of the training process, which makes it easier to use by practitioners.