Achieve Optimal Adversarial Accuracy for Adversarial Deep Learning using Stackelberg Game
This work addresses the practical challenge of adversarial robustness in deep learning for security-critical applications, providing theoretical guarantees for optimal adversarial accuracy, though it is incremental by extending game-theoretic formulations from simultaneous to sequential games.
The paper tackles the problem of training robust deep neural networks (DNNs) against adversarial attacks by formulating adversarial deep learning as sequential Stackelberg games, proving the existence of equilibria and showing that the equilibrium DNN achieves the largest adversarial accuracy among DNNs with the same structure when using Carlini-Wagner's margin loss.
Adversarial deep learning is to train robust DNNs against adversarial attacks, which is one of the major research focuses of deep learning. Game theory has been used to answer some of the basic questions about adversarial deep learning such as the existence of a classifier with optimal robustness and the existence of optimal adversarial samples for a given class of classifiers. In most previous work, adversarial deep learning was formulated as a simultaneous game and the strategy spaces are assumed to be certain probability distributions in order for the Nash equilibrium to exist. But, this assumption is not applicable to the practical situation. In this paper, we give answers to these basic questions for the practical case where the classifiers are DNNs with a given structure, by formulating the adversarial deep learning as sequential games. The existence of Stackelberg equilibria for these games are proved. Furthermore, it is shown that the equilibrium DNN has the largest adversarial accuracy among all DNNs with the same structure, when Carlini-Wagner's margin loss is used. Trade-off between robustness and accuracy in adversarial deep learning is also studied from game theoretical aspect.