AINov 11, 2024

Adversarial Detection with a Dynamically Stable System

arXiv:2411.06666v12.3h-index: 2

Originality Highly original

AI Analysis

This addresses the unreliability of existing adversarial detection methods against new attacks, offering a more robust solution for security in AI systems.

The paper tackles the problem of detecting adversarial examples in machine learning models by proposing a Dynamically Stable System (DSS) that uses stability analysis, achieving ROC-AUC values of 99.83%, 97.81%, and 94.47% on MNIST, CIFAR10, and CIFAR100 datasets, surpassing state-of-the-art methods.

Adversarial detection is designed to identify and reject maliciously crafted adversarial examples(AEs) which are generated to disrupt the classification of target models. Presently, various input transformation-based methods have been developed on adversarial example detection, which typically rely on empirical experience and lead to unreliability against new attacks. To address this issue, we propose and conduct a Dynamically Stable System (DSS), which can effectively detect the adversarial examples from normal examples according to the stability of input examples. Particularly, in our paper, the generation of adversarial examples is considered as the perturbation process of a Lyapunov dynamic system, and we propose an example stability mechanism, in which a novel control term is added in adversarial example generation to ensure that the normal examples can achieve dynamic stability while the adversarial examples cannot achieve the stability. Then, based on the proposed example stability mechanism, a Dynamically Stable System (DSS) is proposed, which can utilize the disruption and restoration actions to determine the stability of input examples and detect the adversarial examples through changes in the stability of the input examples. In comparison with existing methods in three benchmark datasets(MNIST, CIFAR10, and CIFAR100), our evaluation results show that our proposed DSS can achieve ROC-AUC values of 99.83%, 97.81% and 94.47%, surpassing the state-of-the-art(SOTA) values of 97.35%, 91.10% and 93.49% in the other 7 methods.

View on arXiv PDF

Similar