CVAINov 13, 2025

Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation

arXiv:2511.10020v18 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the need for versatile anomaly generation in computer vision, offering a foundation model that can synthesize anomalies for any normal-category image, though it is incremental in building on existing inpainting and multimodal techniques.

The authors tackled the problem of generating realistic anomalies without exemplars by proposing Anomagic, a zero-shot method that uses crossmodal prompts and contrastive refinement, resulting in improved anomaly detection accuracy with more varied and realistic anomalies than prior methods.

We propose Anomagic, a zero-shot anomaly generation method that produces semantically coherent anomalies without requiring any exemplar anomalies. By unifying both visual and textual cues through a crossmodal prompt encoding scheme, Anomagic leverages rich contextual information to steer an inpainting-based generation pipeline. A subsequent contrastive refinement strategy enforces precise alignment between synthesized anomalies and their masks, thereby bolstering downstream anomaly detection accuracy. To facilitate training, we introduce AnomVerse, a collection of 12,987 anomaly-mask-caption triplets assembled from 13 publicly available datasets, where captions are automatically generated by multimodal large language models using structured visual prompts and template-based textual hints. Extensive experiments demonstrate that Anomagic trained on AnomVerse can synthesize more realistic and varied anomalies than prior methods, yielding superior improvements in downstream anomaly detection. Furthermore, Anomagic can generate anomalies for any normal-category image using user-defined prompts, establishing a versatile foundation model for anomaly generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes