STMLSep 20, 2021

Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

arXiv:2109.09590v1
Originality Synthesis-oriented
AI Analysis

This addresses outlier detection for applications like fraud detection or predictive maintenance, but appears incremental as it builds on existing rank statistics methods.

The paper tackles outlier detection by learning a data-driven scoring function that reflects abnormality, using a binary classification approach based on two-sample linear rank statistics, with preliminary numerical experiments showing encouraging results.

The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive maintenance). In this paper, we propose a methodology addressing the problem of outlier detection, by learning a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations. This scoring function is learnt through a well-designed binary classification problem whose empirical criterion takes the form of a two-sample linear rank statistics on which theoretical results are available. We illustrate our methodology with preliminary encouraging numerical experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes