Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That Backfire
This work addresses security concerns in practical collaborative learning environments, suggesting that current backdoor defense research may need re-evaluation, though it appears incremental in analyzing existing attack scenarios.
The paper tackles the problem of multi-agent backdoor attacks in collaborative learning, finding that when multiple attackers attempt to poison a model simultaneously, they often achieve a low collective attack success rate, with equilibrium rates at the lower bound across various configurations.
Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.