LGAIJan 28, 2023

Selecting Models based on the Risk of Damage Caused by Adversarial Attacks

arXiv:2301.12151v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the need for actionable risk metrics in safety-critical AI applications, though it is incremental as it builds on existing adversarial attack concerns.

The paper tackles the problem of assessing the risk of damage from adversarial attacks in AI systems, proposing a method to model and estimate this probability, with experiments showing it outperforms conventional metrics and enables reliable model selection.

Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications. One of the key concerns is that adversaries can cause harm by manipulating model predictions without being detected. Regulation hence demands an assessment of the risk of damage caused by adversaries. Yet, there is no method to translate this high-level demand into actionable metrics that quantify the risk of damage. In this article, we propose a method to model and statistically estimate the probability of damage arising from adversarial attacks. We show that our proposed estimator is statistically consistent and unbiased. In experiments, we demonstrate that the estimation results of our method have a clear and actionable interpretation and outperform conventional metrics. We then show how operators can use the estimation results to reliably select the model with the lowest risk.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes