Directional Adversarial Training for Cost Sensitive Deep Learning Classification Applications
This work addresses the problem of balancing robustness and accuracy in deep learning for real-world applications, offering an incremental improvement over existing adversarial training methods.
The paper tackles the trade-off between robustness and accuracy in adversarial training by proposing Wasserstein Projected Gradient Descent (WPGD), which achieves cost-sensitive robustness and finer control over this trade-off, validated on image recognition tasks with benchmark datasets.
In many real-world applications of Machine Learning it is of paramount importance not only to provide accurate predictions, but also to ensure certain levels of robustness. Adversarial Training is a training procedure aiming at providing models that are robust to worst-case perturbations around predefined points. Unfortunately, one of the main issues in adversarial training is that robustness w.r.t. gradient-based attackers is always achieved at the cost of prediction accuracy. In this paper, a new algorithm, called Wasserstein Projected Gradient Descent (WPGD), for adversarial training is proposed. WPGD provides a simple way to obtain cost-sensitive robustness, resulting in a finer control of the robustness-accuracy trade-off. Moreover, WPGD solves an optimal transport problem on the output space of the network and it can efficiently discover directions where robustness is required, allowing to control the directional trade-off between accuracy and robustness. The proposed WPGD is validated in this work on image recognition tasks with different benchmark datasets and architectures. Moreover, real world-like datasets are often unbalanced: this paper shows that when dealing with such type of datasets, the performance of adversarial training are mainly affected in term of standard accuracy.