CVMay 3, 2024

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

arXiv:2405.02068v12 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses industrial anomaly detection, an incremental improvement over existing knowledge distillation methods by enhancing feature discrepancy robustness.

The paper tackles the challenge of maintaining ideal feature discrepancy assumptions in knowledge distillation for industrial anomaly detection by proposing a two-stage framework called AAND, which sequentially performs anomaly amplification and normality distillation to achieve robust feature discrepancy, achieving state-of-the-art performance on datasets like MvTecAD, VisA, and MvTec3D-RGB.

With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution. However, it still remains a challenging issue to maintain these ideal assumptions in practice. In this paper, we propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND, which sequentially performs Anomaly Amplification and Normality Distillation to obtain robust feature discrepancy. In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder. With the exposure of synthetic anomalies, it amplifies anomalies via residual generation while maintaining the integrity of pre-trained model. It mainly comprises a Matching-guided Residual Gate and an Attribute-scaling Residual Generator, which can determine the residuals' proportion and characteristic, respectively. In the second normality distillation stage, we further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns. Comprehensive experiments on the MvTecAD, VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes