LGAICRCVApr 21, 2021

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

arXiv:2104.10459v29 citations
AI Analysis

This addresses a practical security threat for machine learning systems by mitigating low-cost, realistic attacks without sacrificing accuracy.

The paper tackled the problem of Universal Adversarial Perturbations (UAPs) that fool neural networks across many inputs, and showed that Jacobian regularization increases robustness to UAPs by up to four times while maintaining clean performance.

Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes