LGCRCVDec 11, 2017

Training Ensembles to Detect Adversarial Examples

arXiv:1712.04006v140 citations
Originality Incremental advance
AI Analysis

This addresses the security vulnerability of neural networks to adversarial attacks, which is a critical issue for deploying AI in safety-sensitive applications, though it appears incremental as it builds on existing ensemble and detection techniques.

The paper tackled the problem of detecting adversarial examples in neural networks by proposing a new ensemble training method that reduces classification error on benign data while minimizing agreement on out-of-distribution examples, achieving improved detection rates against various attacks including DeepFool and C&W on MNIST and CIFAR-10 datasets.

We propose a new ensemble method for detecting and classifying adversarial examples generated by state-of-the-art attacks, including DeepFool and C&W. Our method works by training the members of an ensemble to have low classification error on random benign examples while simultaneously minimizing agreement on examples outside the training distribution. We evaluate on both MNIST and CIFAR-10, against oblivious and both white- and black-box adversaries.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes