Asymptotic Behavior of Adversarial Training Estimator under $\ell_\infty$-Perturbation
It addresses the problem of improving robustness and sparsity in machine learning models against adversarial attacks for researchers and practitioners in adversarial machine learning.
This paper investigates the asymptotic behavior of adversarial training estimators under ℓ∞-perturbation in generalized linear models, showing that the estimator can put positive probability mass at zero for true zero parameters, providing theoretical sparsity-recovery guarantees, and proposes an adaptive adversarial training method that achieves asymptotic variable-selection consistency and unbiasedness.
Adversarial training has been proposed to protect machine learning models against adversarial attacks. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the asymptotic distribution of the adversarial training estimator under $\ell_\infty$-perturbation could put a positive probability mass at $0$ when the true parameter is $0$, providing a theoretical guarantee of the associated sparsity-recovery ability. Alternatively, a two-step procedure is proposed -- adaptive adversarial training, which could further improve the performance of adversarial training under $\ell_\infty$-perturbation. Specifically, the proposed procedure could achieve asymptotic variable-selection consistency and unbiasedness. Numerical experiments are conducted to show the sparsity-recovery ability of adversarial training under $\ell_\infty$-perturbation and to compare the empirical performance between classic adversarial training and adaptive adversarial training.