A Primer on Multi-Neuron Relaxation-based Adversarial Robustness Certification
This addresses the need for reliable safety assurances in real-world AI deployments, though it builds incrementally on prior work like Singh et al. (2019).
The paper tackles the problem of adversarial vulnerability in deep neural networks by developing a unified mathematical framework for relaxation-based robustness certification, which provides provable guarantees against any adversary, and shows that the k-ReLU multi-neuron relaxation method obtains tighter bounds by leveraging correlations among groups of neurons.
The existence of adversarial examples poses a real danger when deep neural networks are deployed in the real world. The go-to strategy to quantify this vulnerability is to evaluate the model against specific attack algorithms. This approach is however inherently limited, as it says little about the robustness of the model against more powerful attacks not included in the evaluation. We develop a unified mathematical framework to describe relaxation-based robustness certification methods, which go beyond adversary-specific robustness evaluation and instead provide provable robustness guarantees against attacks by any adversary. We discuss the fundamental limitations posed by single-neuron relaxations and show how the recent ``k-ReLU'' multi-neuron relaxation framework of Singh et al. (2019) obtains tighter correlation-aware activation bounds by leveraging additional relational constraints among groups of neurons. Specifically, we show how additional pre-activation bounds can be mapped to corresponding post-activation bounds and how they can in turn be used to obtain tighter robustness certificates. We also present an intuitive way to visualize different relaxation-based certification methods. By approximating multiple non-linearities jointly instead of separately, the k-ReLU method is able to bypass the convex barrier imposed by single neuron relaxations.