Unsupervised Feature Selection Based on the Morisita Estimator of Intrinsic Dimension
This work addresses feature selection for data analysis, particularly for non-linear data, but appears incremental as an advanced version of an existing fractal dimension reduction technique.
The paper tackles the problem of selecting the smallest subset of features without losing information by proposing a new filter algorithm based on the Morisita estimator of intrinsic dimension, which reduces data dimensionality significantly without loss of relevant information as shown in tests on simulated and real-world data.
This paper deals with a new filter algorithm for selecting the smallest subset of features carrying all the information content of a data set (i.e. for removing redundant features). It is an advanced version of the fractal dimension reduction technique, and it relies on the recently introduced Morisita estimator of Intrinsic Dimension (ID). Here, the ID is used to quantify dependencies between subsets of features, which allows the effective processing of highly non-linear data. The proposed algorithm is successfully tested on simulated and real world case studies. Different levels of sample size and noise are examined along with the variability of the results. In addition, a comprehensive procedure based on random forests shows that the data dimensionality is significantly reduced by the algorithm without loss of relevant information. And finally, comparisons with benchmark feature selection techniques demonstrate the promising performance of this new filter.