Are generative deep models for novelty detection truly better?
This work addresses the problem of evaluating generative models for anomaly detection, showing they are not inherently better than simpler methods, which is significant for researchers and practitioners in machine learning as it challenges common assumptions and highlights practical limitations.
The paper compared generative deep models and classical methods for anomaly detection across many non-image datasets, finding that generative models' performance depends heavily on hyperparameter selection and deteriorates with fewer anomalous samples, and none systematically outperformed kNN in practical scenarios.
Many deep models have been recently proposed for anomaly detection. This paper presents comparison of selected generative deep models and classical anomaly detection methods on an extensive number of non--image benchmark datasets. We provide statistical comparison of the selected models, in many configurations, architectures and hyperparamaters. We arrive to conclusion that performance of the generative models is determined by the process of selection of their hyperparameters. Specifically, performance of the deep generative models deteriorates with decreasing amount of anomalous samples used in hyperparameter selection. In practical scenarios of anomaly detection, none of the deep generative models systematically outperforms the kNN.