ML LGMay 31, 2023

Distance Rank Score: Unsupervised filter method for feature selection on imbalanced dataset

Katarina Firdova, Céline Labart, Arthur Martel

arXiv:2305.19804v1

Originality Incremental advance

AI Analysis

This work addresses feature selection for imbalanced datasets, which is crucial for applications like anomaly detection, but it appears incremental as it builds on existing filter methods with a specific adaptation.

The paper tackles the problem of unsupervised feature selection on imbalanced multi-class datasets, such as clusters of different anomaly types, by introducing a new filter method based on Spearman's Rank Correlation between distances on observations and feature values, which avoids the drawbacks of variance-based methods and shows improved performance in clustering tasks compared to existing methods.

This paper presents a new filter method for unsupervised feature selection. This method is particularly effective on imbalanced multi-class dataset, as in case of clusters of different anomaly types. Existing methods usually involve the variance of the features, which is not suitable when the different types of observations are not represented equally. Our method, based on Spearman's Rank Correlation between distances on the observations and on feature values, avoids this drawback. The performance of the method is measured on several clustering problems and is compared with existing filter methods suitable for unsupervised data.

View on arXiv PDF

Similar