LGCRApr 20, 2023

Get Rid Of Your Trail: Remotely Erasing Backdoors in Federated Learning

arXiv:2304.10638v129 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses a security vulnerability in federated learning for adversaries, but it is incremental as it builds on existing backdoor attacks and unlearning concepts.

The paper tackles the problem of adversaries needing to remove backdoors from federated learning models to avoid detection, proposing a method based on machine unlearning that effectively erases backdoors while preserving model performance, as demonstrated in image classification scenarios with state-of-the-art attacks.

Federated Learning (FL) enables collaborative deep learning training across multiple participants without exposing sensitive personal data. However, the distributed nature of FL and the unvetted participants' data makes it vulnerable to backdoor attacks. In these attacks, adversaries inject malicious functionality into the centralized model during training, leading to intentional misclassifications for specific adversary-chosen inputs. While previous research has demonstrated successful injections of persistent backdoors in FL, the persistence also poses a challenge, as their existence in the centralized model can prompt the central aggregation server to take preventive measures to penalize the adversaries. Therefore, this paper proposes a methodology that enables adversaries to effectively remove backdoors from the centralized model upon achieving their objectives or upon suspicion of possible detection. The proposed approach extends the concept of machine unlearning and presents strategies to preserve the performance of the centralized model and simultaneously prevent over-unlearning of information unrelated to backdoor patterns, making the adversaries stealthy while removing backdoors. To the best of our knowledge, this is the first work that explores machine unlearning in FL to remove backdoors to the benefit of adversaries. Exhaustive evaluation considering image classification scenarios demonstrates the efficacy of the proposed method in efficient backdoor removal from the centralized model, injected by state-of-the-art attacks across multiple configurations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes