CVDec 6, 2024

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects

Lei Fan, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, Yang Song

arXiv:2412.04867v117.330 citationsh-index: 24CVPR

Originality Synthesis-oriented

AI Analysis

This dataset addresses the problem of anomaly detection for tiny objects across multiple domains, providing a comprehensive resource for researchers in computer vision and machine learning, though it is incremental as it builds on existing dataset creation efforts.

The authors introduced MANTA, a large-scale multi-view visual-text dataset for anomaly detection in tiny objects, containing over 137.3K images with 8.6K pixel-level anomalous annotations and text components including declarative knowledge and constructivist learning questions. They proposed a baseline and conducted benchmarking experiments to demonstrate the dataset's challenges and utility.

We present MANTA, a visual-text anomaly detection dataset for tiny objects. The visual component comprises over 137.3K images across 38 object categories spanning five typical domains, of which 8.6K images are labeled as anomalous with pixel-level annotations. Each image is captured from five distinct viewpoints to ensure comprehensive object coverage. The text component consists of two subsets: Declarative Knowledge, including 875 words that describe common anomalies across various domains and specific categories, with detailed explanations for < what, why, how>, including causes and visual characteristics; and Constructivist Learning, providing 2K multiple-choice questions with varying levels of difficulty, each paired with images and corresponded answer explanations. We also propose a baseline for visual-text tasks and conduct extensive benchmarking experiments to evaluate advanced methods across different settings, highlighting the challenges and efficacy of our dataset.

View on arXiv PDF

Similar