LGCRMLMay 22, 2019

Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection

arXiv:1905.09186v131 citations
Originality Incremental advance
AI Analysis

This addresses the reliability issue in neural network confidence for detecting errors, which is crucial for safety-critical applications, though it is incremental as it builds on existing logit-based methods.

The paper tackles the problem of neural networks being overconfident in misclassifications, including adversarial examples and out-of-distribution data, by using introspection on logit activations, achieving competitive detection levels.

Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes