LGCRCVMLMar 6, 2020

Explaining Away Attacks Against Neural Networks

arXiv:2003.05748v1Has Code
Originality Incremental advance
AI Analysis

This addresses the security vulnerability of neural networks to adversarial attacks, which is a critical issue for deploying AI in sensitive applications, though it appears incremental as it builds on existing explanation methods.

The paper tackles the problem of detecting adversarial attacks on image-based neural networks by showing significant discrepancies in model explanations between clean and adversarial data, and proposes a framework that identifies adversarial inputs based on these explanations.

We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. Utilizing this intuition, we propose a framework which can identify whether a given input is adversarial based on the explanations given by the model. Code for our experiments can be found here: https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes