RAT: Boosting Misclassification Detection Ability without Extra Data
This addresses safety-critical applications like autonomous driving and healthcare by improving misclassification detection without extra data, though it is incremental as it builds on adversarial perturbation concepts.
The paper tackles the problem of detecting misclassifications in deep neural networks for image classification by using robust radius as a confidence metric and introducing Radius Aware Training (RAT). The result is up to a 29.3% reduction in AURC and 21.62% reduction in FPR@95TPR compared to previous methods.
As deep neural networks(DNN) become increasingly prevalent, particularly in high-stakes areas such as autonomous driving and healthcare, the ability to detect incorrect predictions of models and intervene accordingly becomes crucial for safety. In this work, we investigate the detection of misclassified inputs for image classification models from the lens of adversarial perturbation: we propose to use robust radius (a.k.a. input-space margin) as a confidence metric and design two efficient estimation algorithms, RR-BS and RR-Fast, for misclassification detection. Furthermore, we design a training method called Radius Aware Training (RAT) to boost models' ability to identify mistakes. Extensive experiments show our method could achieve up to 29.3% reduction on AURC and 21.62% reduction in FPR@95TPR, compared with previous methods.