AIApr 19, 2024

How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples

Dren Fazlija, Arkadij Orlov, Johanna Schrader, Monty-Maximilian Zühlke, Michael Rohs, Daniel Kudenko

arXiv:2404.12653v14.21 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses the need for better human evaluation in adversarial machine learning, particularly for unrestricted attacks that could bypass existing defenses, though it is incremental as it builds on existing assessment frameworks.

The paper tackles the problem of evaluating the human imperceptibility of unrestricted adversarial examples in images, which lack traditional perturbation constraints, by proposing SCOOTER, a framework for conducting statistically significant human experiments with standardized questions and an implementation.

With an ever-increasing reliance on machine learning (ML) models in the real world, adversarial examples threaten the safety of AI-based systems such as autonomous vehicles. In the image domain, they represent maliciously perturbed data points that look benign to humans (i.e., the image modification is not noticeable) but greatly mislead state-of-the-art ML models. Previously, researchers ensured the imperceptibility of their altered data points by restricting perturbations via $\ell_p$ norms. However, recent publications claim that creating natural-looking adversarial examples without such restrictions is also possible. With much more freedom to instill malicious information into data, these unrestricted adversarial examples can potentially overcome traditional defense strategies as they are not constrained by the limitations or patterns these defenses typically recognize and mitigate. This allows attackers to operate outside of expected threat models. However, surveying existing image-based methods, we noticed a need for more human evaluations of the proposed image modifications. Based on existing human-assessment frameworks for image generation quality, we propose SCOOTER - an evaluation framework for unrestricted image-based attacks. It provides researchers with guidelines for conducting statistically significant human experiments, standardized questions, and a ready-to-use implementation. We propose a framework that allows researchers to analyze how imperceptible their unrestricted attacks truly are.

View on arXiv PDF

Similar