Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection
This work addresses the need for more realistic and controllable anomaly synthesis in industrial inspection, offering incremental improvements over existing methods.
The paper tackles the problem of structural deficiencies in anomaly synthesis methods by introducing ARAS, a language-conditioned auto-regressive approach that injects text-specified defects into normal images, and integrates it into the QARAD framework for anomaly detection, achieving improved accuracy and a 5 times synthesis speedup compared to diffusion-based methods.
Despite substantial progress in anomaly synthesis methods, existing diffusion-based and coarse inpainting pipelines commonly suffer from structural deficiencies such as micro-structural discontinuities, limited semantic controllability, and inefficient generation. To overcome these limitations, we introduce ARAS, a language-conditioned, auto-regressive anomaly synthesis approach that precisely injects local, text-specified defects into normal images via token-anchored latent editing. Leveraging a hard-gated auto-regressive operator and a training-free, context-preserving masked sampling kernel, ARAS significantly enhances defect realism, preserves fine-grained material textures, and provides continuous semantic control over synthesized anomalies. Integrated within our Quality-Aware Re-weighted Anomaly Detection (QARAD) framework, we further propose a dynamic weighting strategy that emphasizes high-quality synthetic samples by computing an image-text similarity score with a dual-encoder model. Extensive experiments across three benchmark datasets-MVTec AD, VisA, and BTAD, demonstrate that our QARAD outperforms SOTA methods in both image- and pixel-level anomaly detection tasks, achieving improved accuracy, robustness, and a 5 times synthesis speedup compared to diffusion-based alternatives. Our complete code and synthesized dataset will be publicly available.