DBAILGMLAug 27, 2020

The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets

arXiv:2008.12330v19 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of optimizing anomaly detection for practitioners by showing that discretization choice impacts detection of specific anomaly types, but it is incremental as it builds on existing typology and methods.

The study investigated how different discretization methods affect the detection of six types of anomalies in datasets, finding that standard SECODA can detect all types but that specific methods favor certain anomaly types.

Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset. Numerous algorithms use discretization of numerical data in their detection processes. This study investigates the effect of the discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a recent typology of data anomalies. To this end, experiments are conducted with various datasets and SECODA, a general-purpose algorithm for unsupervised non-parametric anomaly detection in datasets with numerical and categorical attributes. This algorithm employs discretization of continuous attributes, exponentially increasing weights and discretization cut points, and a pruning heuristic to detect anomalies with an optimal number of iterations. The results demonstrate that standard SECODA can detect all six types, but that different discretization methods favor the discovery of certain anomaly types. The main findings also hold for other detection techniques using discretization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes