LG AI MLSep 9, 2019

Outlier Detection in High Dimensional Data

arXiv:1909.03681v18 citations

Originality Incremental advance

AI Analysis

This work addresses a specific challenge in data mining for high-dimensional datasets, offering an incremental improvement over existing methods.

The paper tackles outlier detection in high-dimensional data, where existing algorithms often fail, especially with small sample sizes. The proposed method, based on principal component analysis and kernel density estimation, outperforms benchmarks in F1-score and execution time on synthetic and real-life data.

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the $F_1$-score. Our method also produces better-than-average execution times compared to the benchmark methods.

View on arXiv PDF

Similar