LGMLOct 15, 2019

MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers

arXiv:1910.06588v117 citations
Originality Synthesis-oriented
AI Analysis

It addresses outlier detection for data mining applications, such as detecting taxi fare overcharging, but is incremental as it combines existing statistical and clustering techniques.

The paper tackles outlier detection by proposing MSD-Kmeans, a hybrid algorithm combining Mean and Standard Deviation with K-means, which achieved the highest precision, accuracy, and F-measure compared to existing methods like MSD, K-means, Z-score, MIQR, and LOF.

Outlier detection is a technique in data mining that aims to detect unusual or unexpected records in the dataset. Existing outlier detection algorithms have different pros and cons and exhibit different sensitivity to noisy data such as extreme values. In this paper, we propose a novel cluster-based outlier detection algorithm named MSD-Kmeans that combines the statistical method of Mean and Standard Deviation (MSD) and the machine learning clustering algorithm K-means to detect outliers more accurately with the better control of extreme values. There are two phases in this combination method of MSD-Kmeans: (1) applying MSD algorithm to eliminate as many noisy data to minimize the interference on clusters, and (2) applying K-means algorithm to obtain local optimal clusters. We evaluate our algorithm and demonstrate its effectiveness in the context of detecting possible overcharging of taxi fares, as greedy dishonest drivers may attempt to charge high fares by detouring. We compare the performance indicators of MSD-Kmeans with those of other outlier detection algorithms, such as MSD, K-means, Z-score, MIQR and LOF, and prove that the proposed MSD-Kmeans algorithm achieves the highest measure of precision, accuracy, and F-measure. We conclude that MSD-Kmeans can be used for effective and efficient outlier detection on data of varying quality on IoT devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes