Backdoor Attacks on Bayesian Neural Networks using Reverse Distribution
This work addresses security vulnerabilities in outsourced ML training for industries using MLaaS, showing that BNNs are not immune to backdoor attacks.
The paper tackles the problem of backdoor attacks on Bayesian neural networks (BNNs), which are considered robust due to their uncertainty quantification, by proposing a novel attack using reverse distribution to achieve a 100% attack success rate, compared to less than 60% for state-of-the-art methods.
Due to cost and time-to-market constraints, many industries outsource the training process of machine learning models (ML) to third-party cloud service providers, popularly known as ML-asa-Service (MLaaS). MLaaS creates opportunity for an adversary to provide users with backdoored ML models to produce incorrect predictions only in extremely rare (attacker-chosen) scenarios. Bayesian neural networks (BNN) are inherently immune against backdoor attacks since the weights are designed to be marginal distributions to quantify the uncertainty. In this paper, we propose a novel backdoor attack based on effective learning and targeted utilization of reverse distribution. This paper makes three important contributions. (1) To the best of our knowledge, this is the first backdoor attack that can effectively break the robustness of BNNs. (2) We produce reverse distributions to cancel the original distributions when the trigger is activated. (3) We propose an efficient solution for merging probability distributions in BNNs. Experimental results on diverse benchmark datasets demonstrate that our proposed attack can achieve the attack success rate (ASR) of 100%, while the ASR of the state-of-the-art attacks is lower than 60%.