LG AIMay 25

Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark

Xu Yao, Siyuan Zhou, Wu Zhenbo, Chaochuan Hou, Shuang Liang, Shiping wang, Hailiang Huang, Songqiao Han, Minqi Jiang

arXiv:2605.2606883.3Has Code

AI Analysis

This benchmark provides a standardized evaluation framework for the fragmented field of weakly supervised anomaly detection, offering critical insights for practitioners and researchers.

The paper introduces WSADBench, the first unified benchmark for weakly supervised anomaly detection across incomplete, inexact, and inaccurate supervision. Based on over 700K experiments evaluating 36 algorithms across 4 modalities, it reveals that specialized WSAD methods excel only in extreme label scarcity, while tabular foundation models dominate with more supervision, and unlabeled data provides marginal gains compared to label refinement.

Weakly supervised anomaly detection (WSAD) has developed in three primary directions: incomplete, inexact, and inaccurate supervision. However, these directions remain isolated, lacking a unified framework to assess whether they address unique challenges or share fundamental mechanics. This paper introduces WSADBench, the first benchmark that unifies evaluation across distinct weakly supervised scenarios, benchmarking diverse approaches from specialized WSAD methods to advanced tabular foundation models. WSADBench establishes standardized protocols to evaluate 36 algorithms across 4 modalities by systematically varying label quantity, granularity, and quality, revealing the performance boundaries of various methods. Based on over 700K experiments, WSADBench reveals four critical insights: (i) Strong intrinsic correlations exist between these weak supervision scenarios, challenging the isolation of current research directions. (ii) Specialized WSAD algorithms excel only in extreme label-scarcity regimes but are quickly dominated by tabular foundation models and general classification methods as supervision increases or in OOD scenarios. (iii) Unlabeled data shows inconsistent utility across settings, with marginal gains compared to label refinement. (iv) Models exhibit asymmetric sensitivity to different types of label noise. We release WSADBench as an open-source benchmark with code and datasets to facilitate future WSAD research: https://github.com/SUFE-AILAB/WSADBench.

View on arXiv PDF Code

Similar