A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly Detection
This addresses the problem of detecting and localizing anomalies without manual annotations for applications in computer vision, though it is incremental in simplifying existing methods.
The paper tackles unsupervised anomaly detection and localization by proposing a simple autoencoder architecture that reconstructs hierarchical features from a pre-trained encoder, without needing data augmentations or anomalous training images. It achieves state-of-the-art performance on MNIST, Fashion-MNIST, CIFAR-10, and MVTec datasets.
Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. In particular, it does not require any data augmentations and anomalous images for training. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. The difference comparison between those features of encoder and decode lead to more accurate and robust localization results than the comparison in single feature or pixel-by-pixel comparison in the conventional works. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets on both anomaly detection and localization.