A Unified Bilevel Model for Adversarial Learning and A Case Study
This work addresses the lack of clear mechanisms and metrics for adversarial attacks in machine learning, particularly for clustering models, but it appears incremental as it builds on existing adversarial learning concepts.
The authors tackled the problem of interpreting and measuring adversarial attacks in machine learning by proposing a unified bilevel model, and they applied it to clustering models to show that small data perturbations maintain robustness while large ones cause attacks, with analysis of a δ-measure for effect quantification.
Adversarial learning has been attracting more and more attention thanks to the fast development of machine learning and artificial intelligence. However, due to the complicated structure of most machine learning models, the mechanism of adversarial attacks is not well interpreted. How to measure the effect of attack is still not quite clear. In this paper, we propose a unified bilevel model for adversarial learning. We further investigate the adversarial attack in clustering models and interpret it from data perturbation point of view. We reveal that when the data perturbation is relatively small, the clustering model is robust, whereas if it is relatively large, the clustering result changes, which leads to an attack. To measure the effect of attacks for clustering models, we analyse the well-definedness of the so-called $δ$-measure, which can be used in the proposed bilevel model for adversarial learning of clustering models.