Selecting Features by their Resilience to the Curse of Dimensionality
This addresses feature selection for high-dimensional data to improve comprehensibility and interpretability, representing an incremental advance by incorporating the curse of dimensionality into the selection process.
The paper tackles the problem of high-dimensional datasets hindered by the curse of dimensionality by proposing a novel feature selection method that identifies features allowing discrimination of data subsets based on intrinsic dimensionality, showing competitive and often superior performance compared to established methods.
Real-world datasets are often of high dimension and effected by the curse of dimensionality. This hinders their comprehensibility and interpretability. To reduce the complexity feature selection aims to identify features that are crucial to learn from said data. While measures of relevance and pairwise similarities are commonly used, the curse of dimensionality is rarely incorporated into the process of selecting features. Here we step in with a novel method that identifies the features that allow to discriminate data subsets of different sizes. By adapting recent work on computing intrinsic dimensionalities, our method is able to select the features that can discriminate data and thus weaken the curse of dimensionality. Our experiments show that our method is competitive and commonly outperforms established feature selection methods. Furthermore, we propose an approximation that allows our method to scale to datasets consisting of millions of data points. Our findings suggest that features that discriminate data and are connected to a low intrinsic dimensionality are meaningful for learning procedures.