LGMLOct 18, 2018

Unsupervised Anomalous Data Space Specification

arXiv:1810.08309v1
Originality Incremental advance
AI Analysis

This provides a more efficient and verifiable approach to anomaly detection for applications requiring quick and reliable identification of outliers.

The paper tackles the problem of anomaly detection by transforming an existing algorithm to generate a complete specification of anomalous values from a small training set in constant time, enabling validation without exhaustive testing.

Computer algorithms are written with the intent that when run they perform a useful function. Typically any information obtained is unknown until the algorithm is run. However, if the behavior of an algorithm can be fully described by precomputing just once how this algorithm will respond when executed on any input, this precomputed result provides a complete specification for all solutions in the problem domain. We apply this idea to a previous anomaly detection algorithm, and in doing so transform it from one that merely detects individual anomalies when asked to discover potentially anomalous values, into an algorithm also capable of generating a complete specification for those values it would deem to be anomalous. This specification is derived by examining no more than a small training data, can be obtained in very small constant time, and is inherently far more useful than results obtained by repeated execution of this tool. For example, armed with such a specification one can ask how close an anomaly is to being deemed normal, and can validate this answer not by exhaustively testing the algorithm but by examining if the specification so generated is indeed correct. This powerful idea can be applied to any algorithm whose runtime behavior can be recovered from its construction and so has wide applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes