LGCOFeb 6, 2025

Finding Pegasus: Enhancing Unsupervised Anomaly Detection in High-Dimensional Data using a Manifold-Based Approach

arXiv:2502.04310v1h-index: 115
Originality Incremental advance
AI Analysis

This addresses the challenge of anomaly detection in high-dimensional datasets for applications like data mining, offering an incremental improvement through a novel combination method.

The paper tackles the problem of unsupervised anomaly detection in high-dimensional data by analyzing it from a manifold perspective, introducing a framework to categorize methods as 'on manifold' or 'off manifold'. This approach improves recall by up to 16% on MNIST data without sacrificing precision compared to the best standalone method.

Unsupervised machine learning methods are well suited to searching for anomalies at scale but can struggle with the high-dimensional representation of many modern datasets, hence dimensionality reduction (DR) is often performed first. In this paper we analyse unsupervised anomaly detection (AD) from the perspective of the manifold created in DR. We present an idealised illustration, "Finding Pegasus", and a novel formal framework with which we categorise AD methods and their results into "on manifold" and "off manifold". We define these terms and show how they differ. We then use this insight to develop an approach of combining AD methods which significantly boosts AD recall without sacrificing precision in situations employing high DR. When tested on MNIST data, our approach of combining AD methods improves recall by as much as 16 percent compared with simply combining with the best standalone AD method (Isolation Forest), a result which shows great promise for its application to real-world data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes