SecureSplit: Mitigating Backdoor Attacks in Split Learning
This addresses a security vulnerability in collaborative machine learning for privacy-sensitive applications, representing an incremental improvement over existing defenses.
The paper tackles the problem of backdoor attacks in Split Learning, where malicious clients insert hidden triggers, and introduces SecureSplit, a defense mechanism that uses dimensionality transformation and adaptive filtering to mitigate these attacks, achieving effectiveness across four datasets, five attack scenarios, and outperforming seven alternative defenses.
Split Learning (SL) offers a framework for collaborative model training that respects data privacy by allowing participants to share the same dataset while maintaining distinct feature sets. However, SL is susceptible to backdoor attacks, in which malicious clients subtly alter their embeddings to insert hidden triggers that compromise the final trained model. To address this vulnerability, we introduce SecureSplit, a defense mechanism tailored to SL. SecureSplit applies a dimensionality transformation strategy to accentuate subtle differences between benign and poisoned embeddings, facilitating their separation. With this enhanced distinction, we develop an adaptive filtering approach that uses a majority-based voting scheme to remove contaminated embeddings while preserving clean ones. Rigorous experiments across four datasets (CIFAR-10, MNIST, CINIC-10, and ImageNette), five backdoor attack scenarios, and seven alternative defenses confirm the effectiveness of SecureSplit under various challenging conditions.