An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data
This addresses a specific bottleneck in machine learning for domains with limited data, such as bioinformatics or medical imaging, but it is incremental as it builds on existing feature selection approaches.
The paper tackles the problem of feature selection for high-dimensional, low-sample-size (HDLSS) data by proposing a method based on pairwise feature proximity, which outperforms many state-of-the-art methods in experiments on benchmark datasets.
Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.