LGCRCVMLMay 26, 2019

Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

arXiv:1905.10729v11 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of expensive adversarial training for machine learning practitioners, offering a more efficient solution, though it is incremental as it builds on existing adversarial training techniques.

The paper tackles the high cost of adversarial training for protecting machine learning models by training an external auto-encoder with iterative adversarial training, which can then be used to protect other models directly. The result shows that this method outperforms other purifying-based methods against white-box attacks and transfers well to models with different architectures.

Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes