LGFeb 8, 2025

Democratic Training Against Universal Adversarial Perturbations

arXiv:2502.05542v17.11 citationsh-index: 4ICLR

Originality Highly original

AI Analysis

This work addresses a significant problem for developers of security-sensitive systems that rely on deep neural networks, providing an incremental solution to mitigate universal adversarial perturbations.

The authors tackled the problem of universal adversarial perturbations in deep neural networks and proposed a defense method called Democratic Training, which effectively reduces the attack success rate and improves model robustness. The method was evaluated on 7 neural networks and 5 benchmark datasets, showing promising results.

Despite their advances and success, real-world deep neural networks are known to be vulnerable to adversarial attacks. Universal adversarial perturbation, an input-agnostic attack, poses a serious threat for them to be deployed in security-sensitive systems. In this case, a single universal adversarial perturbation deceives the model on a range of clean inputs without requiring input-specific optimization, which makes it particularly threatening. In this work, we observe that universal adversarial perturbations usually lead to abnormal entropy spectrum in hidden layers, which suggests that the prediction is dominated by a small number of ``feature'' in such cases (rather than democratically by many features). Inspired by this, we propose an efficient yet effective defense method for mitigating UAPs called \emph{Democratic Training} by performing entropy-based model enhancement to suppress the effect of the universal adversarial perturbations in a given model. \emph{Democratic Training} is evaluated with 7 neural networks trained on 5 benchmark datasets and 5 types of state-of-the-art universal adversarial attack methods. The results show that it effectively reduces the attack success rate, improves model robustness and preserves the model accuracy on clean samples.

View on arXiv PDF

Similar