CVJul 29, 2025

EMIT: Enhancing MLLMs for Industrial Anomaly Detection via Difficulty-Aware GRPO

arXiv:2507.21619v18 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the limited effectiveness of MLLMs in industrial anomaly detection, a domain-specific problem for manufacturing safety, with incremental advancements in adaptation methods.

The paper tackles the problem of adapting multimodal large language models (MLLMs) for industrial anomaly detection (IAD) by proposing EMIT, a framework that uses difficulty-aware group relative policy optimization (GRPO) and other techniques, resulting in an average performance improvement of 7.77% over a base model across seven tasks.

Industrial anomaly detection (IAD) plays a crucial role in maintaining the safety and reliability of manufacturing systems. While multimodal large language models (MLLMs) show strong vision-language reasoning abilities, their effectiveness in IAD remains limited without domain-specific adaptation. In this work, we propose EMIT, a unified framework that enhances MLLMs for IAD via difficulty-aware group relative policy optimization (GRPO). EMIT constructs a multi-task IAD dataset and utilizes GPT-generated object text descriptions to compensate for missing defective images. For few-shot anomaly detection, it integrates a soft prompt and heatmap-guided contrastive embeddings derived from patch-level comparisons. To better handle difficult data samples, i.e., cases where the MLLM struggles to generate correct answers, we propose a difficulty-aware GRPO that extends the original GRPO by incorporating a response resampling strategy to ensure the inclusion of correct answers in the sampled responses, as well as an advantage reweighting mechanism to strengthen learning from such difficult data samples. Extensive experiments on the MMAD benchmark demonstrate that EMIT significantly enhances the IAD performance of MLLMs, achieving an average improvement of 7.77\% over the base model (InternVL3-8B) across seven tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes