LGMar 18, 2021

Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles

arXiv:2103.10229v110 citations
Originality Incremental advance
AI Analysis

This addresses the critical issue of adversarial vulnerability in neural networks for AI safety and reliability, though it appears incremental as it builds on existing work on model fragility.

The paper tackles the problem of understanding how adversarial examples fool deep neural networks by presenting a visual framework that reveals differences in model perception between adversarial and regular data, showing how this can identify exploited areas and guide improvements in training and architecture.

As neural networks become the tool of choice to solve an increasing variety of problems in our society, adversarial attacks become critical. The possibility of generating data instances deliberately designed to fool a network's analysis can have disastrous consequences. Recent work has shown that commonly used methods for model training often result in fragile abstract representations that are particularly vulnerable to such attacks. This paper presents a visual framework to investigate neural network models subjected to adversarial examples, revealing how models' perception of the adversarial data differs from regular data instances and their relationships with class perception. Through different use cases, we show how observing these elements can quickly pinpoint exploited areas in a model, allowing further study of vulnerable features in input data and serving as a guide to improving model training and architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes