LGCVMLDec 13, 2018

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem

arXiv:1812.05720v2626 citations
Originality Incremental advance
AI Analysis

This addresses a safety issue for deploying classifiers in critical applications, though it is an incremental improvement on existing methods like adversarial training.

The paper tackles the problem of ReLU neural networks producing high-confidence predictions on out-of-distribution data, which is unsafe for critical systems, and proposes a robust optimization technique that effectively reduces such confidence while maintaining performance on the original task.

Classifiers used in the wild, in particular for safety-critical systems, should not only have good generalization properties but also should know when they don't know, in particular make low confidence predictions far away from the training data. We show that ReLU type neural networks which yield a piecewise linear classifier function fail in this regard as they produce almost always high confidence predictions far away from the training data. For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data. We show that this technique is surprisingly effective in reducing the confidence of predictions far away from the training data while maintaining high confidence predictions and test error on the original classification task compared to standard training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes