ML LGSep 4, 2017

Random Subspace with Trees for Feature Selection Under Memory Constraints

Antonio Sutera, Célia Châtel, Gilles Louppe, Louis Wehenkel, Pierre Geurts

arXiv:1709.01177v22.62 citations

Originality Incremental advance

AI Analysis

This addresses the problem of feature selection in memory-limited settings for machine learning practitioners, but it appears incremental as it builds on existing tree-based methods.

The paper tackles feature selection for high-dimensional data under memory constraints by proposing a novel tree-based method that builds randomized trees on small variable subsets, mixing relevant and random variables. It provides a theoretical analysis of the method's soundness and convergence speed, along with preliminary empirical results.

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.

View on arXiv PDF

Similar