Protecting against simultaneous data poisoning attacks
This addresses a critical security vulnerability for ML systems trained on large, potentially compromised datasets, representing a significant advance over incremental improvements.
The paper tackles the problem of simultaneous data poisoning attacks in machine learning models, showing that existing defenses fail in this setting, and introduces BaDLoss, which reduces attack success rates to 7.98% in CIFAR-10 and 10.29% in GTSRB compared to 64.48% and 84.28% for other defenses.
Current backdoor defense methods are evaluated against a single attack at a time. This is unrealistic, as powerful machine learning systems are trained on large datasets scraped from the internet, which may be attacked multiple times by one or more attackers. We demonstrate that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model without substantially degrading clean accuracy. Furthermore, we show that existing backdoor defense methods do not effectively prevent attacks in this setting. Finally, we leverage insights into the nature of backdoor attacks to develop a new defense, BaDLoss, that is effective in the multi-attack setting. With minimal clean accuracy degradation, BaDLoss attains an average attack success rate in the multi-attack setting of 7.98% in CIFAR-10 and 10.29% in GTSRB, compared to the average of other defenses at 64.48% and 84.28% respectively.