VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization
This work addresses anomaly detection in industrial images, though it appears incremental as it builds on existing transformer and reconstruction approaches.
The authors tackled image anomaly detection and localization by proposing VT-ADL, a vision transformer network that combines reconstruction-based methods with patch embedding, achieving results compared to state-of-the-art algorithms on datasets like MNIST and MVTec.
We present a transformer-based image anomaly detection and localization network. Our proposed model is a combination of a reconstruction-based approach and patch embedding. The use of transformer networks helps to preserve the spatial information of the embedded patches, which are later processed by a Gaussian mixture density network to localize the anomalous areas. In addition, we also publish BTAD, a real-world industrial anomaly dataset. Our results are compared with other state-of-the-art algorithms using publicly available datasets like MNIST and MVTec.