Towards neural networks that provably know when they don't know
This addresses a critical safety issue in AI applications where neural networks must reliably detect when they are uncertain, especially in high-stakes domains.
The paper tackles the problem of neural networks making overconfident predictions on out-of-distribution data by proposing a new approach that provides provable guarantees for low confidence in such cases, achieving state-of-the-art performance while ensuring worst-case safety.
It has recently been shown that ReLU networks produce arbitrarily over-confident predictions far away from the training data. Thus, ReLU networks do not know when they don't know. However, this is a highly important property in safety critical applications. In the context of out-of-distribution detection (OOD) there have been a number of proposals to mitigate this problem but none of them are able to make any mathematical guarantees. In this paper we propose a new approach to OOD which overcomes both problems. Our approach can be used with ReLU networks and provides provably low confidence predictions far away from the training data as well as the first certificates for low confidence predictions in a neighborhood of an out-distribution point. In the experiments we show that state-of-the-art methods fail in this worst-case setting whereas our model can guarantee its performance while retaining state-of-the-art OOD performance.