LGFeb 15, 2021

Low Curvature Activations Reduce Overfitting in Adversarial Training

arXiv:2102.07861v250 citations
Originality Incremental advance
AI Analysis

This addresses the problem of generalization gaps in adversarial training for neural networks, offering a simple method to enhance robustness, though it is incremental as it builds on prior work on activation functions.

The paper tackles overfitting in adversarial training by showing that using activation functions with low curvature reduces both standard and robust generalization gaps, with significant improvements observed for functions like SiLU and LeakyReLU.

Adversarial training is one of the most effective defenses against adversarial attacks. Previous works suggest that overfitting is a dominant phenomenon in adversarial training leading to a large generalization gap between test and train accuracy in neural networks. In this work, we show that the observed generalization gap is closely related to the choice of the activation function. In particular, we show that using activation functions with low (exact or approximate) curvature values has a regularization effect that significantly reduces both the standard and robust generalization gaps in adversarial training. We observe this effect for both differentiable/smooth activations such as SiLU as well as non-differentiable/non-smooth activations such as LeakyReLU. In the latter case, the "approximate" curvature of the activation is low. Finally, we show that for activation functions with low curvature, the double descent phenomenon for adversarially trained models does not occur.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes