Rethinking Feature Uncertainty in Stochastic Neural Networks for Adversarial Robustness
This work addresses adversarial robustness for machine learning models, presenting an incremental improvement by focusing on feature representation in stochastic neural networks.
The paper tackled the problem of adversarial robustness in deep neural networks by proposing a method to maximize feature distribution variance in stochastic neural networks, achieving significant improvements over existing methods on white- and black-box attacks.
It is well-known that deep neural networks (DNNs) have shown remarkable success in many fields. However, when adding an imperceptible magnitude perturbation on the model input, the model performance might get rapid decrease. To address this issue, a randomness technique has been proposed recently, named Stochastic Neural Networks (SNNs). Specifically, SNNs inject randomness into the model to defend against unseen attacks and improve the adversarial robustness. However, existed studies on SNNs mainly focus on injecting fixed or learnable noises to model weights/activations. In this paper, we find that the existed SNNs performances are largely bottlenecked by the feature representation ability. Surprisingly, simply maximizing the variance per dimension of the feature distribution leads to a considerable boost beyond all previous methods, which we named maximize feature distribution variance stochastic neural network (MFDV-SNN). Extensive experiments on well-known white- and black-box attacks show that MFDV-SNN achieves a significant improvement over existing methods, which indicates that it is a simple but effective method to improve model robustness.