Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles
This work addresses the critical need for robust AI in safety-critical domains like self-driving cars, but it is incremental as it builds on existing adversarial training methods.
The paper tackles the problem of improving robustness against adversarial attacks in neural networks by investigating how computational budget affects defense strategies, showing that adversarially-trained ensembles of smaller models are more efficient and robust than larger single models, with specific gains in robustness metrics.
While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can make the models produce extremely inaccurate outputs. This makes these models particularly unsuitable for safety-critical application domains (e.g. self-driving cars) where robustness is extremely important. Recent work has shown that augmenting training with adversarially generated data provides some degree of robustness against test-time attacks. In this paper we investigate how this approach scales as we increase the computational budget given to the defender. We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model. Crucially, we show that it is the adversarial training of the ensemble, rather than the ensembling of adversarially trained models, which provides robustness.