A Typology of Data Anomalies
This addresses the problem of vague anomaly conceptualizations for researchers and practitioners in data analysis, offering a foundational tool, though it is incremental in building on existing typologies.
The paper tackles the lack of a clear and applicable framework for categorizing data anomalies by introducing a general typology that provides tangible definitions and facilitates the evaluation of anomaly detection algorithms, without specifying concrete numerical results.
Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluation of anomaly detection algorithms. With the recent criticism on 'black box' algorithms and analytics it has become clear that this is an undesirable situation. This paper therefore introduces a general typology of anomalies that offers a clear and tangible definition of the different types of anomalies in datasets. The typology also facilitates the evaluation of the functional capabilities of anomaly detection algorithms and as a framework assists in analyzing the conceptual levels of data, patterns and anomalies. Finally, it serves as an analytical tool for studying anomaly types from other typologies.