CLLGJun 27, 2023

On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

arXiv:2306.15705v1223 citationsh-index: 70
Originality Highly original
AI Analysis

This addresses privacy and generalizability concerns in adversarial detection for socially-secure applications, offering a data-free solution.

The paper tackles the problem of detecting adversarial samples without needing original training data by leveraging Universal Adversarial Perturbations (UAPs), achieving competitive detection performance on text classification tasks with equivalent time consumption to normal inference.

Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes