$n$-ML: Mitigating Adversarial Examples via Ensembles of Topologically Manipulated Classifiers
This addresses the security vulnerability of ML systems to adversarial attacks, which is a critical problem for deploying reliable AI in safety-sensitive domains.
The paper tackles the problem of adversarial examples in machine learning by proposing n-ML, an ensemble defense method that trains multiple classifiers to classify adversarial examples differently. The approach maintains state-of-the-art benign accuracy on MNIST, CIFAR10, and GTSRB datasets while providing better resilience against adversarial attacks than existing defenses, often with lower computational overhead.
This paper proposes a new defense called $n$-ML against adversarial examples, i.e., inputs crafted by perturbing benign inputs by small amounts to induce misclassifications by classifiers. Inspired by $n$-version programming, $n$-ML trains an ensemble of $n$ classifiers, and inputs are classified by a vote of the classifiers in the ensemble. Unlike prior such approaches, however, the classifiers in the ensemble are trained specifically to classify adversarial examples differently, rendering it very difficult for an adversarial example to obtain enough votes to be misclassified. We show that $n$-ML roughly retains the benign classification accuracies of state-of-the-art models on the MNIST, CIFAR10, and GTSRB datasets, while simultaneously defending against adversarial examples with better resilience than the best defenses known to date and, in most cases, with lower classification-time overhead.