CV AI IVOct 20, 2025

2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

Usman Ali, Ali Zia, Abdul Rehman, Umer Ramzan, Zohaib Hassan, Talha Sattar, Jing Wang, Wei Xiang

arXiv:2510.21793v11 citationsh-index: 2Has CodeDICTA

Originality Incremental advance

AI Analysis

This work addresses the problem of improving anomaly detection accuracy in industrial settings by fusing 2D and 3D data, representing an incremental advance in domain-specific methods.

The paper tackles robust cross-modal fusion for industrial anomaly detection by proposing an unsupervised framework that synthesizes a unified latent space from RGB images and point clouds, achieving state-of-the-art results with mean I-AUROC scores of 0.972 and 0.901 on benchmarks.

Industrial anomaly detection (IAD) increasingly benefits from integrating 2D and 3D data, but robust cross-modal fusion remains challenging. We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR), which synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders. Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts. Evaluations on the MVTec 3D-AD and Eyecandies benchmarks demonstrate that MAFR achieves state-of-the-art results, with a mean I-AUROC of 0.972 and 0.901, respectively. The framework also exhibits strong performance in few-shot learning settings, and ablation studies confirm the critical roles of the fusion architecture and composite loss. MAFR offers a principled approach for fusing visual and geometric information, advancing the robustness and accuracy of industrial anomaly detection. Code is available at https://github.com/adabrh/MAFR

View on arXiv PDF Code

Similar