LGMar 3, 2022

Data-Efficient and Interpretable Tabular Anomaly Detection

Chun-Hao Chang, Jinsung Yoon, Sercan Arik, Madeleine Udell, Tomas Pfister

arXiv:2203.02034v214.622 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work solves the problem of integrating anomaly detection into real-world applications for stakeholders by enabling data-efficient and interpretable methods, though it is incremental in adapting existing models.

The paper tackles the problem of anomaly detection in tabular data by addressing data efficiency and interpretability, proposing a framework that improves AUC from 86.2% to 89.4% with only 5 labeled anomalies.

Anomaly detection (AD) plays an important role in numerous applications. We focus on two understudied aspects of AD that are critical for integration into real-world applications. First, most AD methods cannot incorporate labeled data that are often available in practice in small quantities and can be crucial to achieve high AD accuracy. Second, most AD methods are not interpretable, a bottleneck that prevents stakeholders from understanding the reason behind the anomalies. In this paper, we propose a novel AD framework that adapts a white-box model class, Generalized Additive Models, to detect anomalies using a partial identification objective which naturally handles noisy or heterogeneous features. In addition, the proposed framework, DIAD, can incorporate a small amount of labeled data to further boost anomaly detection performances in semi-supervised settings. We demonstrate the superiority of our framework compared to previous work in both unsupervised and semi-supervised settings using diverse tabular datasets. For example, under 5 labeled anomalies DIAD improves from 86.2\% to 89.4\% AUC by learning AD from unlabeled data. We also present insightful interpretations that explain why DIAD deems certain samples as anomalies.

View on arXiv PDF

Similar