CVNov 27, 2023

Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights

arXiv:2311.15994v31.5h-index: 3

Originality Incremental advance

AI Analysis

This addresses the need for interpretable adversarial attacks to gain insights into classifier mechanisms, though it is incremental as it builds on existing attack methods by adding interpretability.

The paper tackles the problem of adversarial attacks on DNN-based image classifiers lacking interpretability by proposing Adversarial Doodles, which use interpretable bezier curves to fool classifiers, resulting in small-sized attacks that remain effective when humans replicate them by hand.

DNN-based image classifiers are susceptible to adversarial attacks. Most previous adversarial attacks do not have clear patterns, making it difficult to interpret attacks' results and gain insights into classifiers' mechanisms. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black bezier curves to fool the classifier by overlaying them onto the input image. By introducing random affine transformation and regularizing the doodled area, we obtain small-sized attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable insights into the relationship between the human-drawn doodle's shape and the classifier's output, such as "When we add three small circles on a helicopter image, the ResNet-50 classifier mistakenly classifies it as an airplane."

View on arXiv PDF

Similar