CV AIApr 13

MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

Xincheng Yao, Zefeng Qian, Chao Shi, Jiayang Song, Chongyang Zhang

arXiv:2604.1097159.4h-index: 5

AI Analysis

For researchers in industrial anomaly detection, this work provides a benchmark and baseline model to advance MLLM-based general anomaly detection, though the gains are incremental over existing generalist MLLMs.

The paper introduces MMR-AD, a large-scale multimodal dataset for general anomaly detection (GAD) using Multimodal Large Language Models (MLLMs). It shows that current MLLMs perform poorly on GAD and proposes Anomaly-R1, which achieves significant improvements in detection and localization.

In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not yet suitable for post-training MLLMs. To facilitate MLLM-based general AD research, we present MMR-AD, which is a comprehensive benchmark for both training and evaluating MLLM-based AD models. With MMR-AD, we reveal that the AD performance of current SOTA generalist MLLMs still falls far behind the industrial requirements. Based on MMR-AD, we also propose a baseline model, Anomaly-R1, which is a reasoning-based AD model that learns from the CoT data in MMR-AD and is further enhanced by reinforcement learning. Extensive experiments show that our Anomaly-R1 achieves remarkable improvements over generalist MLLMs in both anomaly detection and localization.

View on arXiv PDF

Similar