Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation
This work addresses the problem of interpretable outlier detection for high-dimensional data analysis, offering an incremental improvement over existing methods.
The paper tackles outlier detection by proposing a method that identifies outliers and explains them through outlier-specific features, using localized logistic regression for density ratio estimation. It demonstrates successful detection of important features in synthetic experiments and tends to outperform existing algorithms on benchmark datasets.
We propose an inlier-based outlier detection method capable of both identifying the outliers and explaining why they are outliers, by identifying the outlier-specific features. Specifically, we employ an inlier-based outlier detection criterion, which uses the ratio of inlier and test probability densities as a measure of plausibility of being an outlier. For estimating the density ratio function, we propose a localized logistic regression algorithm. Thanks to the locality of the model, variable selection can be outlier-specific, and will help interpret why points are outliers in a high-dimensional space. Through synthetic experiments, we show that the proposed algorithm can successfully detect the important features for outliers. Moreover, we show that the proposed algorithm tends to outperform existing algorithms in benchmark datasets.