Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients
This work addresses adversarial robustness for kernel SVM, a classical learning algorithm, filling a gap in research that has focused on deep neural networks, though it is incremental as it adapts existing adversarial training techniques to a specific model.
The paper tackles the problem of adversarial attacks on kernel SVM by proposing adv-SVM, a fast and scalable adversarial training method that connects perturbations in original and kernel spaces and uses doubly stochastic gradients, achieving robustness against various attacks with convergence rates of O(1/t) and efficiency comparable to classical DSG algorithms.
Adversarial attacks by generating examples which are almost indistinguishable from natural examples, pose a serious threat to learning models. Defending against adversarial attacks is a critical element for a reliable learning system. Support vector machine (SVM) is a classical yet still important learning algorithm even in the current deep learning era. Although a wide range of researches have been done in recent years to improve the adversarial robustness of learning models, but most of them are limited to deep neural networks (DNNs) and the work for kernel SVM is still vacant. In this paper, we aim at kernel SVM and propose adv-SVM to improve its adversarial robustness via adversarial training, which has been demonstrated to be the most promising defense techniques. To the best of our knowledge, this is the first work that devotes to the fast and scalable adversarial training of kernel SVM. Specifically, we first build connection of perturbations of samples between original and kernel spaces, and then give a reduced and equivalent formulation of adversarial training of kernel SVM based on the connection. Next, doubly stochastic gradients (DSG) based on two unbiased stochastic approximations (i.e., one is on training points and another is on random features) are applied to update the solution of our objective function. Finally, we prove that our algorithm optimized by DSG converges to the optimal solution at the rate of O(1/t) under the constant and diminishing stepsizes. Comprehensive experimental results show that our adversarial training algorithm enjoys robustness against various attacks and meanwhile has the similar efficiency and scalability with classical DSG algorithm.