CVNov 22, 2022

Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer

arXiv:2211.12311v39 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the need for more effective and generalizable anomaly detection in industrial manufacturing, though it appears incremental as it builds on existing Transformer and self-supervision techniques.

The paper tackles the problem of industrial visual anomaly detection by addressing limitations in existing methods, such as identity mapping shortcuts and lack of global context, and proposes a self-induction vision Transformer (SIVT) for unsupervised generalizable multi-category detection, achieving state-of-the-art improvements of 2.8-6.3 in AUROC and 3.3-7.6 in AP on Mvtec AD benchmarks.

Industrial vision anomaly detection plays a critical role in the advanced intelligent manufacturing process, while some limitations still need to be addressed under such a context. First, existing reconstruction-based methods struggle with the identity mapping of trivial shortcuts where the reconstruction error gap is legible between the normal and abnormal samples, leading to inferior detection capabilities. Then, the previous studies mainly concentrated on the convolutional neural network (CNN) models that capture the local semantics of objects and neglect the global context, also resulting in inferior performance. Moreover, existing studies follow the individual learning fashion where the detection models are only capable of one category of the product while the generalizable detection for multiple categories has not been explored. To tackle the above limitations, we proposed a self-induction vision Transformer(SIVT) for unsupervised generalizable multi-category industrial visual anomaly detection and localization. The proposed SIVT first extracts discriminatory features from pre-trained CNN as property descriptors. Then, the self-induction vision Transformer is proposed to reconstruct the extracted features in a self-supervisory fashion, where the auxiliary induction tokens are additionally introduced to induct the semantics of the original signal. Finally, the abnormal properties can be detected using the semantic feature residual difference. We experimented with the SIVT on existing Mvtec AD benchmarks, the results reveal that the proposed method can advance state-of-the-art detection performance with an improvement of 2.8-6.3 in AUROC, and 3.3-7.6 in AP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes