MLLGAPOct 6, 2022

Anomaly detection using data depth: multivariate case

arXiv:2210.02851v210 citationsh-index: 18
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of identifying anomalies such as fraud or equipment failures for applications in science and industry, but it appears incremental as it builds on existing data depth methods.

The paper tackles anomaly detection in multivariate data by using data depth to label observations with lower depth values as abnormal, discussing practical aspects like invariances, robustness, and computational complexity, with illustrations showing its advantageous behavior in various settings.

Anomaly detection is a branch of data analysis and machine learning which aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any branch of science and industry. By providing a robust ordering, data depth - statistical function that measures belongingness of any point of the space to a data set - becomes a particularly useful tool for detection of anomalies. Already known for its theoretical properties, data depth has undergone substantial computational developments in the last decade and particularly recent years, which has made it applicable for contemporary-sized problems of data analysis and machine learning. In this article, data depth is studied as an efficient anomaly detection tool, assigning abnormality labels to observations with lower depth values, in a multivariate setting. Practical questions of necessity and reasonability of invariances and shape of the depth function, its robustness and computational complexity, choice of the threshold are discussed. Illustrations include use-cases that underline advantageous behaviour of data depth in various settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes