CVApr 21

Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection

arXiv:2604.1925926.3
AI Analysis

For industrial quality inspection, this provides a unified framework that reduces computational overhead while improving robustness across diverse defect types.

FPFNet tackles multi-class industrial defect detection by unifying models across defect categories, achieving 97.17% image-level AUROC on MVTec-AD and 91.08% on VisA, outperforming prior methods without added parameters.

Multi-class defect detection constitutes a critical yet challenging task in industrial quality inspection, where existing approaches typically suffer from two fundamental limitations: (i) the necessity of training separate models for each defect category, resulting in substantial computational and memory overhead, and (ii) degraded robustness caused by inter-class feature perturbation when heterogeneous defect categories are jointly modeled. In this paper, we present FPFNet, a Feature Perturbation Pool-based Fusion Network that synergistically integrates a stochastic feature perturbation pool with a multi-layer feature fusion strategy to address these challenges within a unified detection framework. The feature perturbation pool enriches the training distribution by randomly injecting diverse noise patterns -- including Gaussian noise, F-Noise, and F-Drop -- into the extracted feature representations, thereby strengthening the model's robustness against domain shifts and unseen defect morphologies. Concurrently, the multi-layer feature fusion module aggregates hierarchical feature representations from both the encoder and decoder through residual connections and normalization, enabling the network to capture complex cross-scale relationships while preserving fine-grained spatial details essential for precise defect localization. Built upon the UniAD architecture~\cite{you2022unified}, our method achieves state-of-the-art performance on two widely adopted benchmarks: 97.17\% image-level AUROC and 96.93\% pixel-level AUROC on MVTec-AD, and 91.08\% image-level AUROC and 99.08\% pixel-level AUROC on VisA, surpassing existing methods by notable margins while introducing no additional learnable parameters or computational complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes