CRCVLGFeb 22, 2024

Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models

arXiv:2402.14977v16 citationsh-index: 11USENIX Security Symposium
Originality Incremental advance
AI Analysis

This addresses a critical security issue for AI systems relying on foundation models, though it is an incremental improvement in backdoor defense methods.

The authors tackled the problem of backdoor vulnerabilities in foundation models, which can propagate to downstream classifiers, by proposing Mudjacking, a method that successfully removes backdoors while maintaining model utility across vision and language models, eleven datasets, and multiple attack types.

Foundation model has become the backbone of the AI ecosystem. In particular, a foundation model can be used as a general-purpose feature extractor to build various downstream classifiers. However, foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor vulnerabilities simultaneously. In this work, we propose Mudjacking, the first method to patch foundation models to remove backdoors. Specifically, given a misclassified trigger-embedded input detected after a backdoored foundation model is deployed, Mudjacking adjusts the parameters of the foundation model to remove the backdoor. We formulate patching a foundation model as an optimization problem and propose a gradient descent based method to solve it. We evaluate Mudjacking on both vision and language foundation models, eleven benchmark datasets, five existing backdoor attacks, and thirteen adaptive backdoor attacks. Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes