MLLGJan 25, 2025

Median of Forests for Robust Density Estimation

arXiv:2501.15157v1h-index: 16
Originality Incremental advance
AI Analysis

This addresses the problem of robust density estimation for data analysts dealing with contaminated datasets, offering a method that is more robust and accurate than existing approaches, though it is an incremental improvement over forest-based methods.

The paper tackles robust density estimation in the presence of outliers by proposing MFRDE, an ensemble method using pointwise medians on forest density estimators, which achieves almost the same convergence rate as on uncontaminated data even with many outliers and outperforms existing robust kernel-based methods in experiments.

Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers outside the cells containing the point, which we call \textit{non-local outliers}, but not resistant to the rest \textit{local outliers}. To achieve robustness against all outliers, we propose an ensemble learning algorithm called \textit{medians of forests for robust density estimation} (\textit{MFRDE}), which adopts a pointwise median operation on forest density estimators fitted on subsampled datasets. Compared to existing robust kernel-based methods, MFRDE enables us to choose larger subsampling sizes, sacrificing less accuracy for density estimation while achieving robustness. On the theoretical side, we introduce the local outlier exponent to quantify the number of local outliers. Under this exponent, we show that even if the number of outliers reaches a certain polynomial order in the sample size, MFRDE is able to achieve almost the same convergence rate as the same algorithm on uncontaminated data, whereas robust kernel-based methods fail. On the practical side, real data experiments show that MFRDE outperforms existing robust kernel-based methods. Moreover, we apply MFRDE to anomaly detection to showcase a further application.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes