Statistical Inference Using Mean Shift Denoising
This work addresses data preprocessing challenges for researchers and practitioners in statistics and machine learning, offering incremental improvements to existing methods.
The paper tackles the problem of denoising datasets by analyzing the mean shift algorithm as a distribution operator, showing that it concentrates data points around high-density regions, which enhances clustering techniques, improves two-sample test power, and provides a new anomaly detection method.
In this paper, we study how the mean shift algorithm can be used to denoise a dataset. We introduce a new framework to analyze the mean shift algorithm as a denoising approach by viewing the algorithm as an operator on a distribution function. We investigate how the mean shift algorithm changes the distribution and show that data points shifted by the mean shift concentrate around high density regions of the underlying density function. By using the mean shift as a denoising method, we enhance the performance of several clustering techniques, improve the power of two-sample tests, and obtain a new method for anomaly detection.