LG CVSep 5, 2024

A practical approach to evaluating the adversarial distance for machine learning classifiers

Georg Siedel, Ekagra Gupta, Andrey Morozov

arXiv:2409.03598v12.61 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more informative robustness evaluations in ML, though it appears incremental as it builds on existing adversarial attack and certification techniques.

The paper tackled the problem of evaluating adversarial robustness in machine learning classifiers by proposing a method to estimate adversarial distance using iterative attacks and certification, finding that the attack approach was effective but the certification method underperformed.

Robustness is critical for machine learning (ML) classifiers to ensure consistent performance in real-world applications where models may encounter corrupted or adversarial inputs. In particular, assessing the robustness of classifiers to adversarial inputs is essential to protect systems from vulnerabilities and thus ensure safety in use. However, methods to accurately compute adversarial robustness have been challenging for complex ML models and high-dimensional data. Furthermore, evaluations typically measure adversarial accuracy on specific attack budgets, limiting the informative value of the resulting metrics. This paper investigates the estimation of the more informative adversarial distance using iterative adversarial attacks and a certification approach. Combined, the methods provide a comprehensive evaluation of adversarial robustness by computing estimates for the upper and lower bounds of the adversarial distance. We present visualisations and ablation studies that provide insights into how this evaluation method should be applied and parameterised. We find that our adversarial attack approach is effective compared to related implementations, while the certification method falls short of expectations. The approach in this paper should encourage a more informative way of evaluating the adversarial robustness of ML classifiers.

View on arXiv PDF Code

Similar