Context-dependent feature analysis with random forests
This work addresses feature selection challenges for data scientists dealing with complex interactions, but it appears incremental as it builds on existing random forest frameworks.
The paper tackles the problem of identifying context-dependent feature interactions in datasets by extending random forest variable importances, and demonstrates its relevance on artificial and real datasets.
In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.