An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder
This exposes a critical security flaw in OOD detection methods for AI systems, potentially compromising their reliability in real-world applications.
The paper tackles the problem of out-of-distribution (OOD) detection in deep neural networks by showing that existing classification-based and Glow likelihood-based methods are vulnerable to attacks, as they lack theoretical guarantees and can be broken due to dimensionality reduction.
Deep neural networks (DNNs), especially convolutional neural networks, have achieved superior performance on image classification tasks. However, such performance is only guaranteed if the input to a trained model is similar to the training samples, i.e., the input follows the probability distribution of the training set. Out-Of-Distribution (OOD) samples do not follow the distribution of training set, and therefore the predicted class labels on OOD samples become meaningless. Classification-based methods have been proposed for OOD detection; however, in this study we show that this type of method has no theoretical guarantee and is practically breakable by our OOD Attack algorithm because of dimensionality reduction in the DNN models. We also show that Glow likelihood-based OOD detection is breakable as well.