CVApr 30, 2025

Learning Multi-view Multi-class Anomaly Detection

arXiv:2504.21294v16.21 citationsh-index: 10

Originality Incremental advance

AI Analysis

This addresses the challenge of detecting anomalies across multiple views and classes in computer vision, though it appears incremental as it builds on existing multi-class anomaly detection frameworks.

The paper tackles the problem of poor performance in multi-view multi-class anomaly detection by introducing MVMCAD, which integrates information from multiple views and achieves state-of-the-art performance with scores like 91.0/88.6/82.1 for image-level detection.

The latest trend in anomaly detection is to train a unified model instead of training a separate model for each category. However, existing multi-class anomaly detection (MCAD) models perform poorly in multi-view scenarios because they often fail to effectively model the relationships and complementary information among different views. In this paper, we introduce a Multi-View Multi-Class Anomaly Detection model (MVMCAD), which integrates information from multiple views to accurately identify anomalies. Specifically, we propose a semi-frozen encoder, where a pre-encoder prior enhancement mechanism is added before the frozen encoder, enabling stable cross-view feature modeling and efficient adaptation for improved anomaly detection. Furthermore, we propose an Anomaly Amplification Module (AAM) that models global token interactions and suppresses normal regions to enhance anomaly signals, leading to improved detection performance in multi-view settings. Finally, we propose a Cross-Feature Loss that aligns shallow encoder features with deep decoder features and vice versa, enhancing the model's sensitivity to anomalies at different semantic levels under multi-view scenarios. Extensive experiments on the Real-IAD dataset for multi-view multi-class anomaly detection validate the effectiveness of our approach, achieving state-of-the-art performance of 91.0/88.6/82.1 and 99.1/43.9/48.2/95.2 for image-level and the pixel-level, respectively.

View on arXiv PDF

Similar