Autoencoders for Anomaly Detection are Unreliable
This work highlights a critical unreliability in autoencoders for anomaly detection, which is incremental as it builds on prior skepticism but provides theoretical and experimental evidence, posing risks in safety-critical applications.
The paper demonstrates that autoencoders, commonly used for anomaly detection, can perfectly reconstruct anomalies despite the assumption that they reconstruct normal data more accurately, showing this failure in both linear and non-linear models on tabular and image data.
Autoencoders are frequently used for anomaly detection, both in the unsupervised and semi-supervised settings. They rely on the assumption that when trained using the reconstruction loss, they will be able to reconstruct normal data more accurately than anomalous data. Some recent works have posited that this assumption may not always hold, but little has been done to study the validity of the assumption in theory. In this work we show that this assumption indeed does not hold, and illustrate that anomalies, lying far away from normal data, can be perfectly reconstructed in practice. We revisit the theory of failure of linear autoencoders for anomaly detection by showing how they can perfectly reconstruct out of bounds, or extrapolate undesirably, and note how this can be dangerous in safety critical applications. We connect this to non-linear autoencoders through experiments on both tabular data and real-world image data, the two primary application areas of autoencoders for anomaly detection.