CVIVJun 30, 2023

Efficient Backdoor Removal Through Natural Gradient Fine-tuning

arXiv:2306.17441v11 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the security vulnerability of backdoor attacks in deep learning models, offering an efficient defense with broad applicability, though it is incremental as it builds on existing fine-tuning and optimization techniques.

The paper tackles the problem of backdoor attacks in deep neural networks by proposing Natural Gradient Fine-tuning (NGF), a method that purifies models by fine-tuning only one layer with a geometry-aware optimizer and regularizer, achieving state-of-the-art performance across four datasets and 13 backdoor attacks.

The success of a deep neural network (DNN) heavily relies on the details of the training scheme; e.g., training data, architectures, hyper-parameters, etc. Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of a DNN. Our studies show that a backdoor model is usually optimized to a bad local minima, i.e. sharper minima as compared to a benign model. Intuitively, a backdoor model can be purified by reoptimizing the model to a smoother minima through fine-tuning with a few clean validation data. However, fine-tuning all DNN parameters often requires huge computational costs and often results in sub-par clean test performance. To address this concern, we propose a novel backdoor purification technique, Natural Gradient Fine-tuning (NGF), which focuses on removing the backdoor by fine-tuning only one layer. Specifically, NGF utilizes a loss surface geometry-aware optimizer that can successfully overcome the challenge of reaching a smooth minima under a one-layer optimization scenario. To enhance the generalization performance of our proposed method, we introduce a clean data distribution-aware regularizer based on the knowledge of loss surface curvature matrix, i.e., Fisher Information Matrix. Extensive experiments show that the proposed method achieves state-of-the-art performance on a wide range of backdoor defense benchmarks: four different datasets- CIFAR10, GTSRB, Tiny-ImageNet, and ImageNet; 13 recent backdoor attacks, e.g. Blend, Dynamic, WaNet, ISSBA, etc.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes