LG CRSep 2, 2024

Backdoor Defense through Self-Supervised and Generative Learning

Ivan Sabolić, Ivan Grubišić, Siniša Šegvić

arXiv:2409.01185v16.42 citationsh-index: 3

Originality Highly original

AI Analysis

This work addresses security vulnerabilities in ML models for applications like autonomous systems, offering a novel defense method that is not incremental but builds on generative approaches.

The paper tackled backdoor attacks in machine learning by using generative modeling in a self-supervised representation space to detect and cleanse poisoned data, resulting in a significant reduction in attack success rate while maintaining accuracy on benign inputs.

Backdoor attacks change a small portion of training data by introducing hand-crafted triggers and rewiring the corresponding labels towards a desired target class. Training on such data injects a backdoor which causes malicious inference in selected test samples. Most defenses mitigate such attacks through various modifications of the discriminative learning procedure. In contrast, this paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. Interestingly, these representations get either preserved or heavily disturbed under recent backdoor attacks. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset. Experiments show that training on cleansed dataset greatly reduces the attack success rate and retains the accuracy on benign inputs.

View on arXiv PDF

Similar