LG AI NEJun 20, 2024

Exploring Layerwise Adversarial Robustness Through the Lens of t-SNE

Inês Valentim, Nuno Antunes, Nuno Lourenço

arXiv:2406.14073v14.62 citations

Originality Synthesis-oriented

AI Analysis

This work addresses vulnerabilities in neural networks for security applications, but it is incremental as it applies existing techniques to analyze robustness.

The paper tackled the problem of assessing adversarial robustness in image-classifying artificial neural networks by using t-SNE for visual inspection and a metric to compare clean and perturbed embeddings, finding that differences emerge early in feature extraction layers on CIFAR-10.

Adversarial examples, designed to trick Artificial Neural Networks (ANNs) into producing wrong outputs, highlight vulnerabilities in these models. Exploring these weaknesses is crucial for developing defenses, and so, we propose a method to assess the adversarial robustness of image-classifying ANNs. The t-distributed Stochastic Neighbor Embedding (t-SNE) technique is used for visual inspection, and a metric, which compares the clean and perturbed embeddings, helps pinpoint weak spots in the layers. Analyzing two ANNs on CIFAR-10, one designed by humans and another via NeuroEvolution, we found that differences between clean and perturbed representations emerge early on, in the feature extraction layers, affecting subsequent classification. The findings with our metric are supported by the visual analysis of the t-SNE maps.

View on arXiv PDF

Similar