Theoretical evidence for adversarial robustness through randomization
This work addresses the problem of theoretical justification for adversarial robustness methods, which is important for researchers and practitioners in machine learning security, though it is incremental in building upon existing empirical approaches.
The paper tackles the lack of theoretical understanding for randomization techniques that inject noise at inference time to improve adversarial robustness, providing a theoretical analysis that links randomization rate to robustness and derives a new upper bound on the adversarial generalization gap, supported by experiments.
This paper investigates the theory of robustness against adversarial attacks. It focuses on the family of randomization techniques that consist in injecting noise in the network at inference time. These techniques have proven effective in many contexts, but lack theoretical arguments. We close this gap by presenting a theoretical analysis of these approaches, hence explaining why they perform well in practice. More precisely, we make two new contributions. The first one relates the randomization rate to robustness to adversarial attacks. This result applies for the general family of exponential distributions, and thus extends and unifies the previous approaches. The second contribution consists in devising a new upper bound on the adversarial generalization gap of randomized neural networks. We support our theoretical claims with a set of experiments.