Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks
This work addresses robustness against cyberattacks for safety-critical machine learning systems, but it is incremental as it builds on existing BNN methods.
The study evaluated Bayesian neural networks (BNNs) for robustness in image classification against adversarial attacks, finding that BNNs achieve significantly higher robustness without adversarial training, with the stochastic classifier after a deterministic CNN extractor providing sufficient enhancement.
To evaluate the robustness gain of Bayesian neural networks on image classification tasks, we perform input perturbations, and adversarial attacks to the state-of-the-art Bayesian neural networks, with a benchmark CNN model as reference. The attacks are selected to simulate signal interference and cyberattacks towards CNN-based machine learning systems. The result shows that a Bayesian neural network achieves significantly higher robustness against adversarial attacks generated against a deterministic neural network model, without adversarial training. The Bayesian posterior can act as the safety precursor of ongoing malicious activities. Furthermore, we show that the stochastic classifier after the deterministic CNN extractor has sufficient robustness enhancement rather than a stochastic feature extractor before the stochastic classifier. This advises on utilizing stochastic layers in building decision-making pipelines within a safety-critical domain.