Uni-RCM: Unified Reference-guided Cross-modal Mapping for Multi-Class Anomaly Detection
Enables scalable unified anomaly detection across multiple product categories, addressing a key limitation of current multi-modal industrial inspection systems.
Uni-RCM achieves state-of-the-art multi-class anomaly detection on MVTec-3D AD by using a reference guide block to filter category-specific noise and an offline residual quantizer to model normal distributions, overcoming inter-class interference.
Multi-modal industrial anomaly detection typically relies on separate models for each product category, fundamentally limiting practical scalability. When shifting to a unified paradigm that handles diverse classes simultaneously, detection accuracy often degrades due to inter-class interference and feature manifold confusion. To overcome these challenges, we propose a Unified Reference guided Cross-modal Mapping framework, named Uni-RCM. At its core, we propose a reference guide block to dynamically filter out category-specific noise by introducing a learnable reference feature, which captures the commonalities across different modalities. Besides, an offline residual quantizer is proposed to characterize the normal distribution by multiple cascaded codebooks. Extensive evaluations on the MVTec-3D AD dataset demonstrate the state-of-the-art performance in the challenging multi-class setting and in terms of image-level detection and pixel-level localization.